1 Introduction

Current social robotic systems require interaction protocols which decrease the intuitiveness of the interaction itself, causing frustration and despair in the user. Recently, interest has been focused on measuring the efficacy of robot behaviours and its perceived intelligence based on the evaluation from human users [1]. Indeed, measuring human-robot interaction could suggest what to improve in the cognitive abilities and in the appearance of the robot and how to improve it.

When human-robot interaction fails, the reason most often lies in the fact that the robot and the human try to communicate about different things and that the human partner has wrong expectations of the robotic partner. Several prerequisites have been identified [2, 3] about the features (both physical and cognitive) that let a robot interact effectively and naturally with a human user.

Here, we stress the fact that robots need to reach joint attention with the users for having successful interactions. This has not been achieved so far, since joint attention not only requires visual attention on the same visual features in the environment, but also skills in attention detection, attention manipulation, social interaction skills and even intentional understanding [2]. Without joint attention a robot will not be able to achieve a degree of interaction comparable to a human-human interaction.

Previously, we implemented an attentive mechanism which adopts two fundamental skills for joint attention [4]. In this paper, we focus on measuring the quality of this implementation. By evaluating robot skills, in fact, we want to identify those characteristics that need to be emphasised when implementing attentive mechanisms in robots and to identify correlations between them.

Several metrics for measuring HRI have been proposed, from measuring the ability of a robot to engage in temporally structured behavioural interactions with humans [5], to evaluating robot social effectiveness from different points of view (engineering, psychological, sociological) [6]. We adopted a series of metrics based on cognitive science studies about measuring social skills in humans and based on studies about how robots are perceived by humans and whether this perception affects the expectation humans have about robot intelligence (Godspeed questionnaire [7]).

Quantifying human behaviour usually requires the analysis of video recordings, questionnaires and interviews. In this work, we used the first two methods for quantifying the quality of robot behaviour. We set up four interaction experiments between a humanoid robot and a user and recorded them. After each experiment, the user was asked to fill a questionnaire on the quality of the interaction and on the perception of several functional and physical properties of the robot. To the best of our knowledge, very few studies have been done so far on correlating human perception of robot skills (measured with the Godspeed questionnaire, whose reliability was tested) with proxemic distances.

In [8], Takayama and Pantofaru adopted part of the Godspeed questionnaire in their measurements, finding that people who held more negative attitudes toward robots felt less safe when interacting with them. They also studied human personal space around robots, finding that experience with owning or experience with robots decreases the personal space that people maintain around robots and a robot looking people in the face influences proxemic behaviours. The latter suggests to perform proxemics analysis when measuring attentive mechanisms in robots.

The article is organised as follows: Section 2 introduces the saliency detection and attention manipulation skills implemented on the Nao robot from Aldebaran; Section 3 shows the experimental setup: experimental procedure, the robot platform, structure of the participants, measurements performed, results and discussion; finally, in Sect. 4 we depict the achievements of the current work and how we would like to continue it.

2 Saliency Detection and Attention Manipulation

In this section, we will provide a short overview of the system we implemented on the humanoid robot Nao by which we provide the robot with both saliency detection and attention manipulation skills [9]. For full description and an overview of work in this area, please see [4].

Attention is a cognitive skill, studied in humans and observed in some animal species, which lets a subject concentrate on a particular aspect of the environment without the interference of the surrounding. There is evidence from developmental psychology studies that the development of skills to understand, manipulate and coordinate attentional behaviour lays the foundation of imitation learning and social cognition [10].

In our world, we are constantly surrounded with items, such as objects, people and events, which stand out to their neighbouring items. This is represented with the saliency of those items. Saliency detection represents an attentional mechanism, through which those items are discovered, and it enables humans to shift their limited attentional resources to those objects that stand out the most.

There are two approaches that can be combined—a bottom-up, pre-attentive process and a top-down process influenced by motivation. Bottom-up detection uses different low-level features (e.g. motion, colour, orientation and intensity) for saliency detection. Top-down detection relies on high-level features, and it is highly influenced by our current goals and intentions. The combination of bottom-up and top-down processes is highly inspired by similar mechanisms in humans [11, 12].

Figure 1 gives an overview of the attention mechanism we implemented on the humanoid robot Nao.

Fig. 1
figure 1

Overview of the attentive mechanism. Frames are analysed by three different filters which are activated by the motivation system. Optic flow and face detection filters feed the ego-sphere, while the marker detector filter stores objects in a different memory. The motivation system activates or deactivates filters and movements according to its current state. See Sect. 3 and refer to [4] for further information

For saliency detection, we used optic flow and face detection filters that store the information in a robot ego-sphere, and a marker detector for simplified object detection. Each feature detector represents one filter, and by applying it to the input, a saliency map is generated. The robot directs its attention to the point which has the highest saliency. Due to Nao’s computational limitations, the ego-sphere is represented with a tessellated sphere, where information about salient areas is stored in the edges of the sphere, like in [13, 14]. To simulate a short-term memory, habituation, inhibition and decay mechanisms are employed [15].

Pointing is a way for manipulating the attention of someone else. It is still not clear whether this behaviour is innate or if it results from reaching behaviours in its first developmental stage. Recognising and performing pointing gestures is very important for being able to share attention with another person [2].

We implemented learning through self-exploration on a humanoid platform [16]. We used motor babbling for learning the mapping between different sensory modalities and for equipping the robot with prediction abilities of sensory consequences (in this case, the position of the hand of the robot) from control commands applied to its neck and its arm [3]. Then, we equipped the robot with prediction abilities of arm movement commands that allowed for and resulted in pointing towards an object presented outside the reach of the robot [9].

Finally, we implemented a partially preprogrammed motivation system to show how different behaviours can result in the activation or deactivation of parts of the attention system, actually implementing a top-down approach for saliency detection, or in the activation of attention manipulation.

3 Experiment

The proposed experiment aimed at several goals: test the quality of the implemented saliency detection and attention manipulation mechanisms; identify those physical and behavioural characteristics that need to be emphasised when implementing attentive mechanisms in robots; measure the user experience when interacting with a robot equipped with attentive mechanisms; find correlations between heterogeneous robot features perceived by the participants during the exhibition of attentive mechanisms; and analyse the differences in the perception depending on the different behaviours performed by the robot.

We tested our implementations in four combinations of activated parts of the attention system, which resulted in four different behaviours:

Exploration.:

In this state, the robot is attracted by movements, faces and objects, actually looking like exploring the surrounding environment.

Interaction.:

This behaviour reproduces the experiment done in [9]. The robot is looking and pointing at an object, if there is one.

Interaction avoidance.:

This behaviour implements the loss of interest and boredom. In this state the robot looks away from the object handed over by the interacting partner.

Full interaction.:

This behaviour is composed as a sequence of the previous behaviours. The first performed action is exploration. Once the robot has detected a person to interact with and an object which can be used to draw the attention of the user, its motivation state changes to interaction, and after a certain period it switches to interaction avoidance, which is followed by exploration.

For a full description of the behaviours, please refer to [4].

3.1 Hypotheses

We had several expectations about the outcomes of the experiment. We expected that the level of interactiveness of the robot was positively correlated with the level of excitement and perceived intelligence. Playing with the robot in the interaction state might be more exciting and satisfactory than playing with it in the avoid interaction one.

Multi-modal interaction (through arms and head movements) might increase the perception of interactiveness; on the other side, a less interactive behaviour might decrease user satisfaction and cause the participants to behave nervously.

Anthropomorphic attributes might be positively correlated with the perception of intelligence.

Reaching commands can be perceived as a desire to grasp the object. This has been demonstrated in a preliminary experiment, in which the participants were asked how they interpreted the movements of the robot performing the interaction behaviour.

3.2 Procedure

The experiments consisted of the robot performing the behaviours described in the previous section in four separate interaction sessions, one per each of the four behaviours. The experiment supervisor manually activated or deactivated them. Figure 2 shows a frame taken from a typical interaction session. The user sat in front of the robot at a distance of ca. 90 cm. For each person, each interaction test lasted one minute. We recorded the interaction with a standard camera (resolution 640×480) placed at ca. 2 meters perpendicularly to the robot-user axis. Beside the table where the robot was standing there was a scale drawn on a whiteboard for the visual estimation (estimated average error: 5 cm) of the distance between the nose of the user and the head of the robot and from the hand of the user and the head of the robot; according to the type of interaction, we noticed that the users move their hands closer to the robot.

Fig. 2
figure 2

Experimental setup showing interaction between the Nao and a person

After each of the four interaction sessions, the participants were asked to fill a questionnaire about the quality of the interaction with the robot and about the perception of robot behaviours.

3.3 Robot Platform

The robot platform is the Nao (version 3.3) from Aldebaran, a humanoid robot around 57 cm tall. For the experiment, we used only the degrees of freedom in the arms and the neck. The lower camera is positioned below two eyes, which resulted in the robot not seeing an object if it is brought close to the eyes. For that reason two fake eyes were placed on the sides of the lower camera, and the real eyes were covered with a tape.

The attention mechanism was implemented in C++ using the framework of the Nao Team Humboldt [17]. The attentional mechanism is fully executed onboard the robot and there is no remote processing of the data. The robot is connected to the computer through Ethernet. A robot control program is running on the computer which is used to visualise the data and activate required modules for the attentional mechanism in the framework.

We adopted such a robot for measuring the users’ expectations about the robot’s skills due to its anthropomorphic form. Moreover, its small child-resembling size could reduce users’ expectations, thus increasing the positive evaluation of the interactions.

Unfortunately, Nao has limited computational resources. Our implementation, at the current state, lets the robot process all the filters at a rate of approximately 7–8 frames per second. The computationally most expensive algorithms are those related to image processing, e.g. the face detection filter and the optical flow filter, which together take almost 110 ms per calculation. This results in slower movements and reactions when the robot is in the exploration state and in the exploration part of the full interaction state, which, we expect, could affect the intuitiveness of the interaction. However, in a preliminary experiment, the participants rated the speed of the robot as good. An interesting research question could be what is the proper movement speed a robot might exhibit in order to be perceived as harmless. We included this topic in the future development of the experiment. Furthermore, in the current experiment, although the fastest processing was in interaction avoidance, people perceived the robot as less responsive than during interaction and full interaction.

3.4 Participants

In total 28 people participated in the survey, which results in a total of 112 questionnaires (four questionnaires per participant, one for each interaction). Some participants missed to answer some questions, but those were only a few questions. It is interesting to note that few participants had negative or neutral responses in all four experiments, regardless of the experiment, together with comments saying that Nao did not want anything because it is a machine. This might be perceived as a negative bias towards robots.

Of 28 participants, 8 were female (28.57 %) and 20 were male (71.43 %). There were 17 Germans, 2 Italians, 2 Serbians, 2 Poles, 1 Czech, 1 Dutch, 1 Estonian and 1 French. Regarding previous experience with robots, 25 persons (89.29 %) had none and 3 (10.71 %) had previous experience—one with industrial robots, one with Aldebaran Nao and one with Lego Mindstorms. The average age of the participants was 28.12 (σ=5.64). Among the participants, 75 % had university level education and 25 % had high-school level education.

Unfortunately, not all the participants allowed to be filmed during their interaction because of privacy reasons (even though we informed them that the data will be kept anonymous and videos will not be published against their wish). The video database is composed of 10 videos for exploration, 7 for interaction, 8 for avoid interaction and 9 for full interaction.

3.5 Measurements

Only recently, performance criteria different from those typical for industrial robots have been adopted for measuring the success of social and service robots. Current criteria lie within the satisfaction of the user [18].

We decided to adopt two techniques for evaluating the interaction: questionnaires and proxemics estimated from recorded video sequences of the interaction. So far, we wanted to adopt only metrics related to socio-cognitive skill perception instead of measuring the affective state of the user through the use of physiological sensors.

3.5.1 Questionnaires

We conducted a qualitative, anonymous survey to evaluate how people perceive their interaction with the Nao. Questionnaires are often used to measure the user’s attitude. The first encountered problem was related to what type of questionnaire to adopt. Developing a valid questionnaire can take a considerable amount of time and the absence of standardisation makes it difficult to compare the results with other studies. That is why we decided to adopt standardised measurement tools for human-robot interaction, in addition to some metrics we found interesting for our research. We adopted as a part of our survey the Godspeed questionnaire [7] which uses semantic differential scales for evaluating the attitude towards the robot. Such a questionnaire contains questions (variables) about five concepts (latent variables): Anthropomorphism, Animacy, Likeability, Perceived Intelligence and Perceived Safety (for a detailed description and for the set of questions, please refer to [7]).

Anthropomorphism refers to the attribution of human features and behaviours to non-human agents, such as animals, computers or robots. Anthropomorphism variables were (left value scored as 1, right value scored as 5): fake–natural, machinelike–humanlike, unconscious–conscious, artificial–lifelike, moving rigidly–moving elegantly.

Animacy is the property of alive agents. Robots can perform physical behaviours and reactions to stimuli. The participants’ perception about robot animacy can give important insights for improving robot skills. Variables were: dead–alive, stagnant–lively, mechanical–organic, artificial–lifelike (different from the one in anthropomorphism, as related to the animacy), inert–interactive, apathetic–responsive.

Likeability may influence the user’s judgments. Some studies indicate that people often make important judgments within seconds of meeting a person and it is assumed that people are able to judge also a robot [7]. Likeability variables were: dislike–like, unfriendly–friendly, unkind–kind, unpleasant–pleasant, awful–nice.

Perceived Intelligence is one of the most important metrics for evaluating the efficacy of the implemented skills. It can depend on robot competence, but the duration of the interaction is also one of the most influencing factors, as users can become bored if the interaction is long and the vocabulary of the robot’s behaviours is limited. Variables were: incompetent–competent, ignorant–knowledgeable, irresponsible–responsible, unintelligent–intelligent, foolish–sensible.

Perceived Safety is a metric for estimating the user’s level of comfort when interacting with the robot and the perception of the level of danger. Variables were: anxious–relaxed, agitated–calm, quiescent–surprised (this variable was recoded, as explained in the next paragraph).

The reliability of the questionnaire was analysed by its authors, who claim that such questions have sufficient internal consistency and reliability; to confirm this, we computed Cronbach’s alphaFootnote 1 for each latent variable again. We found that Cronbach’s alpha was negative (α=−1.111) for the latent variable Perceived Safety, due to a negative average covariance among items. This violated reliability model assumptions for that set of variables, due to a miscoding of a variable. In fact, the questionnaire is written in such a way that high values of one variable mean the same thing as low values of the other variable; the miscoded variable was: Quiescent (scaled as 1) to Surprised (scaled as 5), probably due to the fact that participants intended quiescence as a synonym for calmness (the previous variable was Agitated, coded as 1, or Calm, coded as 5). After recoding the quiescent–surprised variable, the Cronbach’s alpha proved to be higher (α PerceivedSafety =0.839).Footnote 2 We did not find any other problems with the rest of the latent variables: α Anthropomorphism =0.825, α Animacy =0.853, α Likeability =0.813, α PerceivedIntelligence =0.750.

In addition to the Godspeed questionnaire, we introduced a new latent variable for measuring the concept of User Satisfaction, with two variables: frustrating—exciting and unsatisfying interaction—satisfying interaction (high Cronbach’s alpha: α UserSatisfaction =0.799).

Open questions were also introduced about the understanding of the behaviour of the robot, its desires, its aiming to interact or not, its successfulness, its gender (with the explanation of the chosen one), its age, type of communication during the interaction, expectations about future improvements and differences between Nao and humans.

3.5.2 Proxemics

According to the sociological concept of proxemics, humans, as well as animals, use to define personal spheres which delimit areas of physical distance that correlate reliably with how much people have in common [19]. The boundaries of such spheres are determined by factors like gender, age and culture. Coming inside the sphere of another person may let him/her feel intimidated, or staying too far can be seen as cold or distant. Four spheres were identified, according to [19]: Intimate Distance (from 0 to 45 cm), reserved for embracing, touching, whispering; Personal Distance (from 45 to 120 cm), reserved for friends; Social Distance (from 1.2 to 3.6 m), reserved for acquaintances and strangers; Public Distance (more than 3.6 m), reserved for public speaking.

To the best of our knowledge, in human-robot interaction, no assumptions about the existence of such boundaries have been made. The focus has been pointed on identifying those factors that influence interaction distance. Interaction distance can be influenced by factors like user age or gender, pet ownership, crowdedness in the environment or available space, as shown in [8, 19]. However, their analyses did not include users’ perceptions about the behaviour or features of the robot.

We wanted to include proxemics measurement hoping to find some correlations between interaction distance and the factors treated in the questionnaire. We analysed participant behaviour also from measuring the distances between the face of the robot and the face of the user and between the face of the robot and the hand of the user.Footnote 3

As introduced in Sect. 3.2, proxemics analyses were done by gathering data from video recorded during the interaction sessions (Fig. 2 shows a sample frame). The user sat in front of the robot at a distance of ca. 90 cm. We recorded the interaction with a standard camera (resolution 640 × 480) placed at ca. 2 meters perpendicularly to the axis robot-user. Beside the table where the robot was standing there was a scale drawn on a whiteboard for the visual estimation (estimated average error: 5 cm) of the distance between the nose of the user and the head of the robot and from the hand of the user and the head of the robot. Videos were annotated manually: every 5 seconds the face-face and face-hand distances were visually estimated by the operator, manually projecting their positions onto the scale drawn on the whiteboard.

Participants were sitting on a chair (they all started at the same distance to the robot), but they were told to feel free to interact in any way they considered more appropriate. However, it happened only in very few cases (only 2 participants) that they stood up. In both cases, we gathered the face-face and face-hand distances as projected onto the horizontal line parallel to the table.

3.6 Results

This section presents the quantitative evaluation of our experiments.

In an earlier experiment we noticed some interesting patterns [4, 9]. It seemed that if a person holds the object close to the robot’s hand, then Nao’s pointing will be perceived as a desire to grasp the object. This could indicate, along with the hypothesis that pointing emerges from grasping, that there is also a reverse connection—pointing can be perceived as grasping, if the object is too close to the hand.Footnote 4 Furthermore, most of the participants in the preliminary experiment responded that Nao was either likeable or very likeable and that the speed of experiment was good (out of three possible answers: too fast, good and too slow), even though the execution speed was lower than in the current experiment. All participants in the preliminary experiment, except one, had no previous experience with robots.

Figure 3 shows the means and the standard deviations of the responses.

Fig. 3
figure 3

These graphs show the results taken from the Godspeed questionnaire

First, we checked whether the distributions of the collected data are normal or not, in order to select the proper statistical tests. For each variable (that is, for each question), we looked at the superimposition of the histogram of the data with a normal curve characterised by the mean and the variance of the data. Almost all the histograms did not fit well together with the corresponding normal curves. Thus, we checked the kurtosis and the skewness of the data,Footnote 5 in order to have a more precise measurement of the normality of the distributions. The distributions of all the variables related to the questionnaire had kurtosis and skewness between −2 and +2, while 17 out of 64 distributions related to the variables of the proxemics analysisFootnote 6 did not.

Due to the non-normality of such distributions, it seems to be more appropriate applying non-parametric statistical tests for the whole analysis. However, the use of ANOVA on Likert-scale data and without the assumption of normality of the distributions of the data to be analysed is controversial. In general, researchers claim that only non-parametric statistics should be used on Likert-scale data and when the normality assumption is violated. Vallejo et al. [20], instead, found that the Repeated Measures ANOVAFootnote 7 was robust toward the violation of normality assumption. Simulation results of Schmider et al. [21] confirm also this observation, since they found in their Monte Carlo study that the empirical Types I and II errors in ANOVA were not affected by the violation of assumptions.

3.6.1 Correlations

A Spearman’s Rank Order correlationFootnote 8 was run to determine the relationship between perceived factors and between them and average human-robot distances. Each run was done for each experimental session (exploration, interaction, interaction avoidance and full interaction).

Tables 1 and 2 show some of the most relevant correlations. In addition to the data shown in the tables, it has to be noted that in the exploration test there was a strong, positive correlation between almost all the anthropomorphism variables and the perceived intelligence attributes related to competence and knowledge; in interaction, the higher the likeability of the robot, the higher the variance of face-face distance during all the interaction tests (r=0.805, P=0.029, N=7); in full interaction, perceived intelligence was found to be positively correlated with almost all the other variables (except those related to perceived safety) with r>0.5 and almost always significant at the 0.01 level.

Table 1 Most relevant correlations (Part 1). For having the full tables, please ask the authors
Table 2 Most relevant correlations (Part 2). For having the full tables, please ask the authors

3.6.2 Repeated Measures ANOVA

Because the participants of the four different observations were the same in each group, we adopted the Repeated measures ANOVA test (post-hoc test using Bonferroni correction) for the analysis of variances. Also known as within-subjects ANOVA test, repeated measures ANOVA is the equivalent of the one-way ANOVA but for related, not independent groups. We performed the test on all the dependent variables.Footnote 9

Post-hoc tests revealed that the four different behaviours performed by the robot have not changed significantly the participants’ perception of the anthropomorphic attributes related to naturalness, humanlikeness, consciousness and artificiality. Table 3 shows the statistically significant results of repeated measures ANOVA on the questionnaire variables.

Table 3 Statistically significant results of repeated measures ANOVA on the questionnaire variables. Cases with sphericity assumption violated were corrected with Greenhouse-Geisser method. The table shows the statistically significant pairwise comparisons (illustrating the changes in means from an observation to another), taken from the post-hoc test with Bonferroni correction

Proxemics variables contain a high number of missing values. In order to perform repeated measures ANOVA on those variables, we had to replace missing values with multiple imputation (n=20). New samples were created, where proxemics information was inferred using the questionnaire variables as predictors.Footnote 10

Table 4 shows the statistically significant results of the repeated measures ANOVA on the proxemics variables.

Table 4 Statistically significant results of repeated measures ANOVA on the proxemics variables. Cases with sphericity assumption violated were corrected with Greenhouse-Geisser method. The table shows the statistically significant pairwise comparisons (illustrating the changes in means from an observation to another), taken from the post-hoc test with Bonferroni correction. Missing values were replaced with multiple imputations. The new dataset contained 560 samples. Abbreviations: AV: average; VAR: variance; FF: distance between the face of the robot and the face of the user; FH: distance between the face of the robot and the closest hand of the user; all: considering the whole duration of the test (60 seconds)

3.6.3 Latent Growth Curve Model

A latent growth curve model was also used to assess the change in user perception over the four behaviours. This model uses a structural equation to estimate two latent variables, the slope and intercept, to assess the average linear change across the measurements, where the individual measurements are the indicators of the latents.Footnote 11 The estimated population distribution of the linear change (or growth) trajectory, denoted by the slope and the intercept of a linear function, are derived from this structural equation model. The estimator selected for the procedure was a Bayesian estimator with non-informative priors.Footnote 12 All calculations were produced with Mplus 6.11.

The estimated slopes for many of the items were almost all positive, with also positive credibility intervals, meaning that there is a significant positive trend in the average score from the first observation (exploration) to the last observation (full interaction).Footnote 13

3.7 Discussion

Despite the small sample size of the data collected during the experiments (especially regarding the proxemics analysis), the outcomes suggested many elements and features that need to be carefully taken into account when developing attentive mechanisms for intuitive robot behaviour.

3.7.1 Godspeed questionnaire

The adoption of the Godspeed questionnaire allowed us to test its qualities. Questionnaires are important tools in measuring user perceptions and the Godspeed provided us with a good instrument for measuring the quality of the implemented robot behaviours. Its authors noted that comparing different robots and their settings by means of the same measurement index will help roboticists in making design decisions. In [22], the indices of the Godspeed questionnaire have been tested as measures of human-like characters. The results indicated significant and strong correlations among some relevant indices and new indices have been proposed. This matches the comments of most of the participants of our experiments which complained about the similarity between many questions and about some high-level attributes which were difficult to assign to the robot. The problem we reported with the recorded variable and the previous notes suggest to not adopt the original version of the Godspeed questionnaire for further experiments, but rather its revisited version.

To the best of our knowledge, no other study on attentional mechanisms for robots has adopted the Godspeed questionnaire as a metric. However, in [23], the authors studied the combined and individual contribution of gestures and gazing to the persuasiveness of a story-telling robot measuring user perception with the Godspeed questionnaire. The robots used persuasive gestures (or not) to accompany the persuasive story, and also used gazing (or not) while telling the persuasive story. Their results indicated that only gazing had a main effect on persuasiveness, while the use of gestures did not. Also, the combined effect of gestures and gazing on persuasiveness was greater than the effect of either gestures or gazing alone. This study suggests that adding multiple social cues can have additive persuasive effects, matching what we will discuss in the next subsection about multi-modal interaction and efficient feedback systems.

3.7.2 Correlations

Correlation analysis confirmed our expectations and suggested directions for improvement of robot attention mechanisms. Positive correlations between anthropomorphic attributes and perceived intelligence confirmed that a robot with human-like appearance can increase the level of its perceived intelligence. However, an excessive human-like appearance can entail the interacting person having too high expectations about the robot’s cognitive capabilities, which can provoke disappointment whenever the robot does not fulfill such expectations. We believe that the positive correlations between the anthropomorphic attributes and the perceived intelligence reflect a good balancing between Nao’s human-like appearance and its implemented cognitive capabilities. Confirming this hypothesis, most of the participants did not try to communicate vocally with the robot, suggesting that they were not expecting this interaction modality due to the absence of a mouth in the robot’s face and due to any other robot’s verbal capability.

Positive correlations between the robot’s interactiveness and user excitement and perception of lifelikeness and intelligence (see Table 1, correlations between Animacy: interactive and Perceived Intelligence: intelligent) suggested also that interactive capabilities emerging from attention mechanisms can increase the perceived level of intelligence of the robot. Such results confirm also the thought that a robot has to be highly interactive for being perceived as a highly intelligent agent, and it has to be responsive for increasing user satisfaction.

We believe that a relevant contribution in the user satisfaction is given by the robot’s responsiveness and interactivity and it can be increased by improving its feedback system. A well designed feedback system could reduce the consequences of some of the robot limitations. In our experiments, participants experienced issues related to the limited field of view of the Nao (58 diagonal FOV). It is plausible that humans expect of humanoids to have approximately matching characteristics, such as the field of view, or two eyes for vision.Footnote 14 During the experiments, participants, without being aware of that, were often waving to the robot or handing over the object out of the robot’s field of view causing no reactions to it. This resulted in affecting the perception of the robot’s responsiveness and interactiveness. A little foresight in the feedback system could have probably reduced this effect, like changing the colour of head LEDs, or emitting sounds, whenever the robot detected something.

Multi-modal interaction (through arm or head movements) increased the level of interactiveness perceived by participants, as suggested by the correlations between Animacy:interactive and several other variables (see Table 1) which during interaction were higher than when the robot performed other behaviours.Footnote 15 The consideration of [23] about combining gestures and gazing for increasing the persuasiveness and the likeability of the robot matches our consideration about multi-modal interaction. Elegance in movement positively correlating with user satisfaction suggests that the robot might perform smooth and natural movements in order to increase the quality of the interaction.

A trustworthy and lifelike robot can be better accepted as a companion or as a co-worker, where close interaction is needed, as suggested by the negative correlation between lifelikeness and face-face average distance recorded during interaction (r=0.805, P=0.029, N=7).

3.7.3 Repeated Measures ANOVA

Repeated measures ANOVA results showed that the aliveness of the robot during exploration scored lower than during interaction and full interaction, again supporting our expectation that multi-modal interaction increases the expressiveness of the robot behaviours (in exploration, the robot performed only head movements). Again, more expressive movements or a better designed feedback system could have increased the level of perceived animacy, likeability and user satisfaction.

The less the interaction was perceived as satisfactory, the more often and the more frenetically the participants moved their hand. Repeated measures ANOVA confirmed that the variance of face-hand distance is higher during interaction avoidance (the least satisfactory robot behaviour for the users) than during the other behaviours. It is also interesting to note how successful the interaction avoidance behaviour was, by which the robot did cause frustration to the users, according to its motivation of avoiding the interaction. Several participants commented this behaviour assigning mental states to the robot, like shyness and angriness.

4 Conclusions

We have created a saliency based attentional model combined with a robot ego-sphere and implemented it on a humanoid robot. In human-robot interaction experiments using this model, we show that different attentional behaviour of the robot has a strong influence on the interaction as experienced by the human.

We have shown that—even on robots with limited computational capacities such as the Nao—it is possible to have an ongoing interaction between the robot and a person. Techniques used are a combination of bottom-up and top-down processes of attention and an ego-sphere as a short-term memory representation in combination with motion, face and object detection.

The adopted questionnaires were useful for correlating perceived physical and behavioural robot features with proxemics data. We noticed some trends suggesting that some of the perceived variables could influence the distances of the interaction.

Through the discussion of the results in the previous section, we identified those characteristics that need to be emphasised and those skills that have to be taken into account (like providing enough feedback during the interaction) when implementing attentive mechanisms in robots.

For future experiments we plan to explore different approaches for dynamic weight assignment for different filters. We also plan to extend the system to include more different filters on the Nao robot (e.g. for audio localisation), as well as to port the approach to other robot platforms. It would be interesting to see how these attentional models would rate on other, non-humanoid platforms. Additionally, the presented full interaction behaviour, consisting of exploration, interaction and interaction avoidance, can be applied to more complex scenarios, and we are planning to explore this further. Gesture recognition and synthesis, and behaviour recognition and execution would enable the robot to better communicate its intentions and understand the intentions of others. We believe that giving visual and auditory feedback to the participant is of extreme importance for increasing the intuitiveness of the interaction and the user satisfaction.

Another interesting research question could be what is the proper movement speed a robot might exhibit in order to be perceived harmless. We included this topic in the future development of the experiment.

We believe that these experiments represent a step in a good direction toward reaching joint attention between a human and a robot. We showed that basic attention manipulation is possible, even with simple robot platforms, such as the Nao, and that participants will assign different characteristics to it based on its behaviour.