This research aims to clarify the type of autonomous movements appropriate for telepresence robots. The design of telepresence robots’ autonomous movements should take into account both local and remote users. From the perspective of local users, we need autonomous movements that enhance a social telepresence in order to smooth remote communication. On the other hand, from the perspective of remote users, autonomous movements should be considered not only to reduce the operation load but also to address the danger of causing discomfort. However, in previous studies on automation, the criteria about which type of movements should be automated has remained unsettled. In this paper, we focused on voluntary and intentional movements as a classification type of movements that can be the criteria. Voluntary movements are intentional movements, whereas involuntary movements are movements without intention. To verify the effect of the automation of these movements, we developed a semi-autonomous telepresence robot that automates voluntary and involuntary movements. Then, we evaluated the impressions from local and remote users by conducting two experiments from each perspective. As a result, when not used in excess, local users evaluated both voluntary and involuntary autonomous movements positively, while it was suggested that automation of voluntary movements for remote users should be implemented with care.
In remote communication, maintaining a high social presence [4, 23] is important to ensure that communication flows smoothly, similar to face-to-face communication. Recent studies have shown that social presence can be improved by using video along with audio in remote communication [4, 13]. However, the presence of remote participants in a conference setting may be overlooked, even with video. This fact suggests that remote communication with video fails to sufficiently present the presence of remote participants [7, 14]. Therefore, telepresence robots for remote communication have been attracting attention.
However, since remote users have to operate the robot while talking, they cannot concentrate on their communication. Therefore, the operation load on remote users when using telepresence robots is a serious problem. Here, a remote user is a person who is remotely connected to a robot via a computer interface. In addition, a local user is a user that is situated in the same physical location as the robot. There are two typical approaches of reducing the operation load on remote users: One involves the recognition of the motion of the remote user through motion capture with a robot that automatically moves in the same manner [6, 9, 20]. The other approach uses a telepresence robot with semi-autonomous movements [22, 33].
For their approach using motion capture, Hasegawa et al. reported that it is possible to avoid speech collision by expressing the movement of remote users captured by a robot with multiple degrees of freedom . There have been studies on enabling telepresence robots to move semi-autonomously and the robots effect on remote users social presence and their sense of agency [22, 33]. Tanaka et al. reported that a local user who communicated with a remote user through a telepresence robot felt the social presence of the remote user even if the robot’s nods were autonomous . Nakamichi et al. reported that even when nodding was fully autonomous, a robot can exhibit a sense of agency if the autonomous movements are appropriate .
Approaches in which a remote user directly expresses the movements of a robot, typified by motion capture, tend to require large equipment and a complex setup to construct a telepresence architecture. On the other hand, semi-autonomous telepresence robots have issues regarding selecting movements to automatically express . While autonomous motion is important to reduce the operation load on remote users, the sense of agency, that is, the ability to recognize oneself as the agent of a movement , of the remote user may be lost or the autonomous movements conflict with the operation of the remote user. However, in studies on automation [22, 33], the criteria for the appropriateness of autonomous movements were unclear.
In this research, we analyzed how the semi-autonomous movements of a humanoid telepresence robot affected local users and the impressions of remote users (Fig. 1). The main purpose of this research is to clarify which type of movements should be semi-autonomous. As a classification type of movement, this research focuses on voluntary and involuntary movements and their effect on semi-automation. Voluntary movements are the movements of intentional behavior, such as following a face, gazing at objects and so on. On the other hand, involuntary movements are the movements without intention, such as blinking or breathing.
To verify the effect of automation of such movements, we develop a telepresence architecture to express the two types of movements that can be manipulated either nonautonomously or autonomously. We used the robovie-mR2  as our telepresence robot, and implemented our architecture that combined autonomous and manual movements controlled by a remote operator. This architecture is an extension of the simple bi-layered (SB) architecture , a behavior-generation architecture.
The structure of this paper is as follows. In Sect. 2, we explain the current research and the SB architecture . In Sect. 3, we discuss our proposed telepresence architecture with semi-autonomous functionality. In Sects. 4 and 5, we explain the two experiments we conducted to investigate the two effects of autonomous robot movements. One experiment centers on the local user and the other on the remote user. Both experiments involved role-play simulations using semi-autonomous telepresence robots and were analyzed through video analysis and a subjective evaluation using a questionnaire. In Sect. 6, we discuss our considerations. Finally, we present our conclusion in Sect. 7.
Overview of Telepresence Robots
Kristoffersson et al. administered a survey that overviewed telepresence robots . There are two types of telepresence robots. The first type includes mobile robotic telepresence (MRP) systems. A typical example of an MRP system is the Personal Roving Presence (PRoP) [27,28,29]. Many telepresence robots such as the PRoP have been developed [1, 37]. These robots enable remote users to move around in a local environment. The second type was developed to promote the sense that a remote user is in the same place as a local user [10, 19, 25, 26, 30, 34, 39]. We mainly focus on the latter type. To emphasize the presence of a remote user, implementing many types of human behaviors onto robots is necessary. Therefore, we used a humanoid robot.
Methods of Operating Telepresence Robots
The simplest methods to operate robots are those that use keyboard inputs and those that provide a graphical interface with controls. Although such methods are used frequently because they are simple and require no special device, simple interfaces were sometimes thought to be insufficient for expressing the rich behavior of robots. Control methods using brain waves [35, 36] have been proposed, but with existing measuring technologies, it is difficult to express sufficient rich behavior.
Therefore, a common method involves observing the movements of a remote operator through motion capture and having a robot express the same movements. Hasegawa et al.  reported that a multi-degree of freedom (multi-DOF) telepresence robot can express the preliminary movements of a remote user so that speech alternation could be efficiently carried out. Matsui et al. aimed at facilitating remote communication by expressing information acquired by motion capture with a telepresence robot that demonstrated an appearance and mannerisms very close to those of a human . Fernando et al.  also used motion capture to realistically reproduce the remote user’s movements onto a robot.
However, special equipment is required on the remote user’s side in systems using motion capture. In this research, we focused on a telepresence robot that can be operated with a simple controller but can express rich and smooth nonverbal human-like behavior.
Various studies have been conducted on semi-autonomous control for subjects such as cars, airplanes, and robots. Approaches of introducing semi-autonomy can be grouped into the three following types.
The first is reducing the operation load on humans under the premise that a system is controlled by humans. This includes the emergency stopping of cars to prevent accidents caused by human error in operation. Since the purpose of the semi-automation is to reduce the operation load, control of the system requires only supplemental movements. In other words, only involuntary movements are autonomous in many cases. A typical example of such an approach is automatically controlling a part of the system as an improvement of the Wizard of Oz method [3, 5]. Semi-autonomous movements are increasingly useful in material requirements planning systems [2, 38]. A typical use is to autonomously move a robot to a location indicated by a remote user. There have been attempts to implement autonomous movements in a telepresence system using a humanoid robot called NAO .
The second type is semi-automation in which humans assist the system. Although ideally the system should move completely autonomously, it is difficult to achieve fully autonomous control. Therefore, humans should be introduced into the system as a breakthrough measure. The following studies investigated such approaches. Shiomi et al. proposed semi-autonomous telepresence robots based on the human model proposed by Norman  that requests control from remote users only when they cannot respond autonomously [17, 31]. Such robots often express autonomous intentional movements such as greetings and route guidance.
The third type takes into account situations in which humans and systems control a robot as a single medium in an equal relationship. Although this requires high mutual adaptation by humans and systems, there have been only a few studies. If the third type is put into practical use, it is possible that a local user will treat the robot as a human under the impression that there is a human in control, despite that it is under autonomous control .
We focused on the automation of a telepresence robot to put the abovementioned third type of approach into practical use. In a system that executes the semi-automation of telepresence robots, screening autonomous movements is important. Such movements have the potential to promote rich behavior and reduce a remote user’s operation load but may cause discomfort. However, behavior suitable for automation has yet to be investigated. We examined the methodology of autonomous movements of a semi-autonomous telepresence robot.
Contingent Behavior-Generation Architecture 
Some of the authors have proposed a model of behaviors that integrates involuntary and voluntary movements in two levels .
For a robot to smoothly interact with a human, it is necessary to express behavior that would not be perceived as strange to humans. The typical antipatterns are awkward movements or have inappropriate delays. Therefore, it is necessary not only to specify fixed movements in advance but to respond instantly to changes in human behavior and the surrounding environment. This is called contingent behavior, and humans tend to actively communicate with robots in a contingent manner.
Contingent behavior occurs arbitrarily in response to stimulation from the outside world. Multiple behaviors may be activated against environmental changes. If behaviors that move the same part of the robot’s body occur at the same time, collisions will occur. Although the colliding behavior has been mediated and integrated, there is insufficient expressing ability in the simple accompanying behavior-generation architecture. On the other hand, in a complicated architecture, it is necessary to add or modify movements within the architecture, which can be challenging depending on the situation.
The SB architecture was proposed to lower the design difficulty of contingent robots. (This part is marked “SB architecture” in Fig. 2) This architecture integrates behaviors into two stages. The first is a priority assignment that selects only one movement from competing movements for each part of the robot’s body, and the second employs weighted averaging to mix multiple movements. In addition, the architecture classifies movements into voluntary and involuntary and registers them. Additionally, in , voluntary movements were defined as an action accompanied by an intention such as looking at an object to express interest and nodding to express listening. In contrast, involuntary movements were defined as actions without intention, such as spontaneous blinks and breathing. Our definition of voluntary and involuntary is the same as their definition. Therefore, regardless of a remote user’s and robot’s intention at the time of behaving, a movement that is intentionally performed by human is a voluntary movement, and a movement that is unintentionally performed by human is an involuntary movement. Our developed architecture, which is an extension of the SB architecture, can automatically select and integrate preset voluntary and involuntary movements for a robot to communicate with humans. Voluntary movements such as gazing at a face and specific objects, eye contact, joint attention, and arm raising, and involuntary movements such as blinking, breathing, and saccades have been implemented. Table 1 lists the movements that are implemented, where types V and I denote voluntary and involuntary movement, respectively.
A Semi-autonomous Telepresence Robot with Voluntary/Involuntary Behaviors
To conduct the experiment to investigate the effect of automation of voluntary and involuntary movements, we developed a semi-autonomous telepresence system that automates the two types of movements.
A Semi-autonomous Telepresence Architecture
We expanded the SB architecture to develop a semi-autonomous system capable of remote control (Fig. 2).
Remote operation is arbitrarily input as a voluntary movement in parallel with autonomous voluntary movements and selected in accordance with priority. They are then merged on the basis of weighted averaging. The movement chosen by remote operation sometimes matches with the automatically generated movements and sometimes conflicts. However, the priority of movements chosen by remote operation is higher than that for any autonomous voluntary movements. The remote user could choose a movement from all voluntary movements that the robot has. Analogous to , the robot has all movements listed in Table 1 implemented. In addition to these voluntary movements, the remote user can also manually control the gazing direction horizontally and vertically. For the weighted merge, we use the same algorithm as . Additionally, the functions of sensors and memory are the same as those in .
This semi-autonomous architecture illustrates the lack of a clear switch between the remote and autonomous control systems, indicating that the two controls coexist in the same architecture. It is a unique architecture in the domain of telepresence robots. In a real use case, a user may decide to not use the manual control at all or to stop the chosen autonomous movements, but within this paper, we only consider the configuration in which both coexist at all times.
A Semi-autonomous Telepresence Robot
The robovie-mR2 robot has 18 servo motors, which can control the rotation of the eyeballs, opening and closing of the eyelids, and rotation of the head, arm, and waist. Cameras and microphones are not installed as standard, so they were installed separately.
To verify the adequacy of a particular autonomous movement, we focused on gaze direction. Gazing is important in human-robot interaction [12, 16]. We expect that the autonomous control of a robot’s gaze frequently conflicts with that of a remote user. Unwanted visual information is transmitted to the remote user in case of such a conflict. In addition, robovie-mR2 can precisely control its eyes, expressing good gaze direction. In fact, when gaze behavior conflicts occur, the local user easily notices the robot’s strange movements.
Typical targets of a robot s gaze when communicating are at a local user’s face or points at which a local user is looking. Therefore, a function to see the detected user’s face and one to look in the direction at which the detected user is looking is implemented as autonomous movements (Fig. 4).
Since a robot cannot predict whether the remote user wants to see the face of the local user or the target of the local user’s attention, conflicts often occur between the remote user and autonomous movements of the robot.
Experiment 1: Evaluating Impressions of Local Users Regarding Robot Through a Small Group Conference Role-play
This experiment aimed to investigate the effect of automation of the two types of movements, voluntary and involuntary, on the impressions of local users. As the evaluation method, we recorded videos from presentation role-play scenarios, assuming a small group conference with our telepresence architecture, and asked the participants of the experiment participants to show it and evaluate their impression.
To investigate the effect of automation of the two types of movements precisely, we designed the experiment such that other effects were removed, while retaining the effect of automation movements. However, the impression of communication is influenced by various factors such as the content of the dialogue. Therefore, as an easy method to address factors unrelated to automated movements, we adopted the process of recording videos from presentation role-play scenarios assuming a small group conference and showed them to the participants of the experiment.
The scenario of the video assumes a small group conference. In the scenario, there were three performers, two local users and one remote user. The local users were Listeners or Presenters, and the remote user attended the presentation as a Listener via a telepresence robot. The arrangement of each performer and the equipment is shown in Fig. 5.
Two types of scenarios were prepared to minimize the influence of the meeting scenario on the impression of the participants of the experiment, called “pencil attraction” and “eraser shape”. “Pencil attraction” is a scenario in which the Presenter introduces the attraction of a pencil to the Listeners. “Eraser shape” is a scenario in which the Presenter introduces various eraser shapes to the Listeners. Both scenarios were approximately one minute and progressed in accordance with the flow shown in Table 2. The experimental conditions using the “pencil attraction” scenario are called the pencil conditions, and those using the “eraser shape” scenario are called the eraser conditions. During the presentation, the Presenter urges the Listeners to focus on the Monitor (scenario (d)) and the remote Listener to focus on the other Listener (scenario (e)). The remote user mainly communicated with the Presenter in this experiment.
To investigate the influence of their individual techniques and habits, three experimenters and one collaborator participated as performers. The collaborator was a 22-year-old male and the only person without prior knowledge of the experiment. Table 3 shows the conditions of this experiment. As the collaborator (C) was the only person without prior knowledge of the experiment, he played the Presenter for both types of scenarios so that the scenarios could be compared. Two experimenters (E1 and E2) played the Presenter under the pencil conditions, and the other (E3) played the Presenter under eraser conditions. A Latin square was used to determine the order of experimental conditions.
We recorded videos from three viewpoints at the same time: centered on the robot (Fig. 5: Camera 1), centered on the Presenter (Fig. 5: Camera 2), and the remote user’s viewpoint through the robot. We used the viewpoint of Camera 1 for analysis and evaluation and used the videos of the other viewpoints as auxiliary for consideration.
Twenty videos were recorded by combining two conditions of autonomous voluntary movements (ON and OFF), two conditions of autonomous involuntary movements (ON and OFF), and five patterns of the performer and scenario (c1–c5).
Two evaluations were conducted. The first was through video analysis to objectively visualize the movements of the robot. All recorded videos were divided into one-second intervals, and the robot’s gazing point, arm swinging and nodding, and the remote user’s utterance were recorded. The seven parts of the scenarios ((a)–(g)) were normalized to be the same length.
The second was a subjective evaluation to examine the impressions of those who observed the robot’s movements. Each participant watched four videos with different system configurations but of the same scenarios (c1–c5) chosen at random as an observer. The four videos were each watched only once. The four system configurations are as follows: autonomous voluntary movements were ON/OFF, and autonomous involuntary movements were ON/OFF.
All participants in the experiment answered the questionnaire shown in Table 4 each time they watched one video, and optionally wrote comments on the video afterwards. To mitigate the influence of the watching order, a counterbalance was taken in accordance with the Latin square.
The following two points were compared: autonomous voluntary movements were ON or OFF, and autonomous involuntary movements were ON or OFF. In short, there are the four comparison conditions of the system configuration, voluntary-ON/involuntary-ON, voluntary-ON/involuntary-OFF, voluntary-OFF/involuntary-ON, and voluntary-OFF/involuntary-OFF.
As the observers, a total of 40 participants of the experiment, including 22 males and 18 females, aged 19 to 28 (average 22.85 ± 1.88 years old), participated to watch and respond to the videos.
Experiment Results 1: Analysis in Accordance with the Time Series
Figure 6 show the results of the analysis in accordance with the time series. Each figure contains four conditions under the same condition of performers and scenarios. Time is represented on the horizontal axis and was normalized for each part of the scenarios shown in Table 2. The robot’s gazing points (Presenter, Monitor, Listener, Other) are red, other behaviors (Hand Wave, Nod) are blue, and the presence or absence of a remote user’s speech is green.
The results for when the collaborator played the Presenter are shown in Fig. 6. Under the condition that autonomous voluntary movements were enabled (voluntary-ON), the gaze was slightly shaking in all cases, and there were many cases in which the robot did not gaze at the Presenter, Monitor, or Listener. Under the condition that autonomous voluntary movements were disabled (voluntary-OFF), the remote user directed a gaze to the Monitor in the middle of part (c) and directed a gaze to the Listener in part (e). When the Presenter asked questions of the remote Listener in part (f), there was a tendency to look closely at the Presenter.
When the experimenter was the Presenter, under the condition of voluntary-OFF, there was a tendency for each individual to gaze in the direction of the gazing points corresponding to the progression of each scenario regardless of who was in charge of the remote user. For example, Experimenters 2 and 3 directed the gazing point to the Listener in the middle of one part of the scenario, but Experimenter 1 did not. On the other hand, under the condition of voluntary-ON, there was no tendency regarding the gazing points that depended on the performer.
Since Experimenter 1 developed the robot, he knew in depth which behaviors stimulate autonomous movements from the robot. For example, when the robot recognizes that there is eye contact with the Presenter and the Presenter gazes in a different direction, the joint attention movement automatically triggers. Experimenter 1 knew how to reliably trigger the eye contact with the correct facial angles. For the two conditions under which Experimenter 1 was the Presenter and voluntary-ON was implemented, the average percentages of time to gaze at the Monitor in part (d) and at the Listener in part (e) were 83%.
On the other hand, when Experimenters 2 and 3 were Presenters, the robot hardly moved while gazing at the Presenter or seemed to look around, and the percentages of time spent gazing at the Monitor and Listener were 27% and 50%, respectively.
Experimental Results 2: Subjective Evaluation Experiment by Experimental Participants
We used a two-way repeated measure ANOVA considering participants as random factors. The results of the comparison regarding the autonomous voluntary and involuntary movements are shown in Fig. 7. A two-way repeated measure ANOVA revealed an interaction between voluntary and involuntary movements at the 5% significance level in Q2, Q3, Q4 and Q5. significance level in Q2, Q3, Q4 and Q5. (Q2:F(1, 39) = 7.193, \(p < 0.05\), Q3:F(1, 39) = 4.534, \(p < 0.05\), Q4:F(1, 39) = 4.968, \(p < 0.05\), Q5:F(1, 39) = 5.371, \(p<0.05\)) Additionally, the interaction was revealed at the 1% significance level (F(1, 39) = 8.590, \(p<.01\)) and Q7. significance level (F(1, 39) = 8.590, \(p <.01\)) and Q7. In Q1 and Q6, no interaction was observed between voluntary and involuntary movement, but there was also no significant difference between the automation conditions of each movement. As post-hoc tests, we used the Bonferroni correction and tested the simple main effects of each movement. Table 5 shows the result of the simple main effects; “V” denotes “Voluntary”, and “I” denotes “Involuntary”. Additionally, in Q1 and Q6, the main effects are shown instead of the simple main effect.
Under the involuntary-OFF condition, the simple main effect of automating voluntary movements was revealed at the 1% significance level in Q2, Q3, Q5 and Q7. significance level in Q2, Q3, Q5 and Q7. In Q3, the simple main effect of automating voluntary movements was revealed at the 1% significance level as well as under the involuntary-ON condition. significance level as well as under the involuntary-ON condition. Under the voluntary-OFF condition, the simple main effect of automating involuntary movements was revealed at the 5% significance level in Q2, Q3 and Q4. Furthermore, in Q2 and Q3, the simple main effect was revealed at the 1% significance level. significance level in Q2, Q3 and Q4. Furthermore, in Q2 and Q3, the simple main effect was revealed at the 1% significance level. Under the voluntary-ON condition, the simple main effect of automating involuntary movements was revealed at the 5% significance level in Q7.
In terms of Q1 and Q6, in which no interaction was observed, no significant difference was found between either voluntary or involuntary movements. Therefore, we will continue with the analysis for Q2, Q3, Q4, Q5 and Q7. In Q2, when either voluntary or involuntary movements were automated, participants highly evaluated them compared to voluntary-OFF/involuntary-OFF conditions. We assume that they could easily perceive the change of the variety of movements by comparing them to the condition of not automating any movements. Moreover, in Q3, when either voluntary or involuntary movements were automated, participants highly evaluated them compared to voluntary-OFF/involuntary-OFF movements. It is natural because the frequency of gestures was only the number of remote operations. In addition, in Q3, participants highly evaluated the voluntary-ON/involuntary-ON compared to voluntary-OFF/involuntary-ON conditions. However, there was not a significant difference between the voluntary-ON/involuntary-ON and voluntary-ON/involuntary-OFF conditions. Therefore, we assume that automating voluntary movement influenced the frequency of gestures. Additionally, in Q4, when either voluntary or involuntary movements were automated, participants highly evaluated compared them to voluntary-OFF/involuntary-OFF movements. In particular, when involuntary movements were automated, there was a significant difference. We assume that the reason is because the implemented involuntary movements contained various movements related to eye movement. In Q5, when the voluntary movement was only automated, participants highly evaluated it compared to voluntary-OFF/involuntary-OFF movements. It is suggested that automating the voluntary movement might influence the impression of presentation. Analyzing the free descriptions of the 40 participants in the experiment who watched the video, 13 participants mentioned the awkwardness of the moving viewpoint in the voluntary-OFF/involuntary-OFF condition. Therefore, we assume that the smoothly moving viewpoint achieved by automating the voluntary movement influenced the smoothness of the presentation. In these five questions, when either the voluntary or involuntary movements were automated, there was a tendency for them to be evaluated highly. However, when automating both movements compared to them, the evaluation was the same or less. Especially in Q7, there was a significant difference between the voluntary-ON/involuntary-ON and voluntary-ON/involuntary-OFF conditions. Analyzing the free descriptions of the 40 participants in the experiment who watched the video, four participants mentioned the discomfort of the mechanical noise during the robot’s motion. When we checked the video again, the mechanical noise was noticeable under this condition. On the basis of this point, when both voluntary and involuntary movements were automated, it is thought that the mechanical noise that was frequently generated with the movement was a factor that lowered the evaluation value under this condition. Since this experiment was conducted in such a way that participants were required to watch a video, it is highly likely that they were particularly concerned about the mechanical noise. However, we can address this problem by reducing the frequency of specific movements that cause mechanical noise.
The telepresence system constructed in this paper prioritizes the remote user input in case of an input conflict between autonomous movements and the user input. It also means that the robot moved automatically soon after the user input stopped, which was found to be an issue. For example, if the user wants to see the presented panel and the robot was looking in that direction, the user would not input any movement. However, the moment the robot recognizes that there is eye contact with the presenter, it will look at the presenter s face instead of the panel. The inability to understand the remote user’s intent led to frustration and exposed the need for a user’s intent inference or adaptation system.
Voluntary autonomous movements conflicted with remote-user operations, and frustrations due to such conflicts were reported when the Collaborator was interviewed after the experiment. Nonetheless, observing the robot’s movements from the viewpoint of the local user did not result in a significant loss of naturalness.
One potential reason for obtaining the above result is that the behavior of the robot developed for this experiment was within the range of the method of expressing a robot’s voluntary and involuntary movements proposed in . The movement expressions used in this experiment are common in the theory of the SB architecture  in that only one voluntary movement was selected. The local user did not know whether the voluntary movement was autonomous or controlled by a remote user. On the other hand, regarding the remote user, there may be cases in which movements differ from what they selected. Therefore, it is highly likely that remote users will not be able to express their desired movements and receive unwanted sensor information feedback. Accordingly, in the next section, we discuss an evaluation that we conducted involving a remote user.
Summary of the Main Findings
Overall, when either voluntary or involuntary movements were automated, there was a tendency for all evaluation values to rise as a whole. Specifically, voluntary movements obtained a high evaluation value even with conflicting remote user input and autonomous movements. On the other hand, when both voluntary and involuntary movements were automated at the same time, the evaluation regarding the behavior of the robot declined compared to the situation in which either voluntary or involuntary movements were automated. According to the comments of the participants, it assumed that the reason is that the mechanical noise was noticeable when both movements were automated.
Experiment 2: Evaluation of Impression of Remote User by Proposal Presentation Role-play
As a next step, we investigate the influence on the impressions of a remote user. Unlike the previous experiment that used only video, the participants actually joined in the role-play. However, since an evaluation through video analysis was also conducted, as described later, we recorded the video in the same manner as discussed in the previous section.
We performed a role-play simulation assuming a presentation. There were two people in the conference: one was a collaborator playing the role of the local user who was the Presenter, and the other was a participant playing the role of the remote user who listened to the presentation.
As a role-playing scenario, we performed a one-minute scenario on the theme of “Destination of laboratory camp.” The focus of this scenario is that the local user introduces two candidate sites and the remote user chooses one. In this scenario, the Presenter directed the gazing points of the remote user to the left and right flip boards. Table 7 shows the flow of the scenario. The arrangement of the performers and equipment are shown in Fig. 8.
We conducted two evaluations. The first was through video analysis to objectively visualize the robot’s semi-autonomous movements and its control logs. Each video was viewed under the same procedure discussed in Sect. 4. Videos from two viewpoints, remote user and local user, and remote-user operation logs were collected and used for analysis.
The second evaluation was a subjective evaluation to examine the impressions of the participants who participated as remote users. Each participant answered the questionnaire shown in Table 6 each time he/she participated in a scenario and optionally wrote comments afterwards. To mitigate the influence of the watching order, a counterbalance was taken in accordance with the Latin square.
Similar to experiment 1, we conducted the experiment under four conditions: voluntary-ON/involuntary-ON, voluntary-ON/involuntary-OFF, voluntary-OFF/involuntary-ON and voluntary-OFF/involuntary-OFF.
In the experiment, the local user was the experimental collaborator of a 21-year-old male. As remote users, there were eight participants, including 5 male and 3 female, aged 19 to 29 (average 23.88 ± 2.67 years old). Each participant participated as a remote user under the following four conditions. However, all participants in the experiment had not participated in the previous experiment.
To become familiar with the operation, the remote user initiated a remote conference in accordance with the scenario in which all the autonomous movements were turned off. Next, a remote conference in accordance with the scenario was carried out under all four conditions. A counterbalance was taken in accordance with the Latin square to mitigate the influence of the execution order of each condition.
Result 1: Analysis in Accordance with the Time Series
Figure 9 shows the results of the analysis in accordance with the time series outlined in Sect. 4.2.1 The results under all four conditions are also shown. The figure shows the robot’s gazing points (Presenter, Flip board A, Flip board B, Other), autonomous movements, and operations entered by the remote operator. The lower part of the figure provides an explanation of the color-coded autonomous operations and control input from the remote user.
The gazing points hardly moved under the voluntary-ON condition. The reason is because, for example, the remote user was trying to move the gazing points to see the flip board but the autonomous action was trying not to move to see the Presenter. Regarding the conflict with autonomous voluntary movements, the remote user’s operations were prioritized. However, the movements of the operator’s gazing points were instantly overwritten with the gaze for the Presenter by autonomous motions.
Result 2: Subjective Evaluation Experiment by Experimental Participants
We used a two-way repeated measure ANOVA considering participants as random factors. As a result, no interaction was observed between voluntary and involuntary movements. Therefore, we will present the results of the individual analysis and discuss the main effect of each result. Table 8 shows the result of a two-way repeated measure ANOVA.
The results of the comparison regarding the automation of involuntary movements are shown in Fig. 10. A significant difference was observed at the 1% significance level in overall satisfaction with the proposed architecture, suggesting that autonomy of involuntary movements improves the satisfaction with the architecture.
The results of the comparison regarding the automation of voluntary movements are shown in Fig. 11. A significant difference was observed at the 5% significance level for Q1 and at the 1% level for all other items. The results indicate that the autonomy of voluntary movements reduces the suitability of telepresence robots as a whole for remote users.
Automation of involuntary movements improved the remote user’s satisfaction with the telepresence system. Even though there was no significant difference for the other questions, the system with autonomous involuntary movements obtained the same or better evaluation values than that without.
Regarding autonomy of voluntary movements, significant differences were observed for all items, suggesting that remote users had negative impressions on the autonomy of voluntary movements.
The result may be due to the frustrations felt by the remote user when the autonomous movements conflicted with the remote user input for voluntary movements. We believe that the frustration is not because the options of movements provided did not cover all the desired movements for the remote user but because of the remote user’s increased dependence on whether the voluntary autonomous movements were enabled or not while everything else, such as the UI and list of movements, was the same.
The results in Fig. 9 also indicate the inhibition of gazing-point movement due to collisions between voluntary autonomous movement and remote operation. When voluntary autonomous movements were expressed, there was an opinion that “gazing points cannot be controlled”, suggesting strong discomfort to the operation of the remote user.
Since voluntary movement is a movement intentionally performed by humans, autonomous generation of appropriate motion is difficult because it requires an estimation of complicated intention. This is why collisions with remote operation frequently occur. Since involuntary movements are not intentionally expressed, autonomous generation of appropriate movements is relatively easy, so it is considered that their influence on the remote user is small.
Summary of the Main Findings
The main findings from the experiments are that automation of voluntary movements increases the negative impression for the remote users. From the result of analyzing the video and the comments by the participants in the experiment, the reason might be a sense of frustration with the collisions that occurred between the autonomous voluntary movement and the remote operation. On the other hand, the automation of involuntary movements improved the remote user’s satisfaction with the telepresence system.
In this section, we will discuss the effect of automation of the two types of movements. In addition, we will discuss other perceptions throughout the experiments and the limitation of our experimental settings.
Involuntary Autonomous Movements
Involuntary autonomous movements were evaluated more highly than were involuntary nonautonomous movements in both experiments as long as they were not in excess. Therefore, if the total amount of movement is within the appropriate range, automation of involuntary movements is generally recommended. Additionally, in expressing involuntary autonomous movements, the weighted average approach proposed in  is considered effective.
Voluntary Autonomous Movements
Although it was suggested that voluntary autonomous movements are effective for local users, they often conflict with the remote user’s input, which leads to frustration. A previous study  suggests that nodding as a voluntary autonomous movement gives a remote user a sense of agency; therefore, automation of voluntary movements is not always inappropriate. For example, there are two cases in which voluntary autonomous movements are effective. One case is when a remote user is away or does not intend to control the robot. We can achieve appropriate voluntary autonomous movements by implementing the function to detect the absence of remote users or absence of their intention to control the robot. The other case is when the robot can adapt to the remote user, avoiding conflicting autonomous movements. This can be achieved by monitoring the operation of remote users and learning the intention of remote users and appropriate movements online.
Effectiveness of Proposed Telepresence Architecture
Within the range of this experiment, the proposed telepresence architecture cannot be said to be effective because of the frustration it induced in the remote user. However, in consideration of the experimental results, there may be some cases where the telepresence robot with the proposed architecture may be considered effective.
In our experiments, voluntary and involuntary autonomous movements were always fixed as enabled or disabled, but remote users can freely turn the movements on or off in a real environment. Therefore, when a remote user is frustrated with respect to the autonomous control of the robot, as observed in the experiments, he/she can turn off the corresponding autonomous control. On the other hand, if the remote user leaves their seat or if control becomes troublesome, the remote user can instantly turn on autonomous control.
If the remote user is a provider of the service, since the impression from the remote user does not matter, one can improve the local user’s impression even if voluntary movement is constantly expressed.
Discussion on Operation Load
As a motivation for semi-autonomous telepresence robot research, a decrease in operation load can be considered. However, this research does not focus on reducing the operation load but rather on what kind of autonomous movements are effective using currently available technology. For that reason, the experiments were not designed to evaluate the operation load.
In reality, the operation load decreases in the case of no control input conflict but increases in the case of conflicts. Therefore, with the design of the experiments, it is difficult to discuss the cause of any increase and decrease in the operation load.
On the other hand, because the autonomous behavior improves a local user’s impression, if a remote user does not stubbornly oppose the autonomous movement, we can assume that the proposed architecture can improve the local user’s impression with a relatively low operation load.
Discussion on Experimental Design
Initially, the experiment was designed with a target demographic of young participants in their 10–20 s. The demographic was chosen on the basis of the premise that the experiment was relatively high paced. If the participants were elderly, there would be a need for slower-paced experimental scenarios. However, we must consider the possibility of the vastly different range of frustration felt by the remote user due to the slower paced experiment.
If the above points were solved and elderly people participated in the experiments, we believe that there is a possibility that the participants will be conservative or will not control the robot at all. In the case of a conservative remote operator, the autonomous movements will guarantee various movements with low user inputs and may also reduce the frustration felt by the remote user as well.
Additionally, to address factors unrelated to automating the movements, we adopted the method of using a video in experiment 1. The approach of using a video might not replicate the embodied interaction sufficiently. However, we think that the technique was sufficient to obtain the evaluation of interactions from an objective perspective. As future work, to investigate the evaluation from a subjective perspective, we need to conduct the experiment so that participants actually participate in an interaction as a local user.
The scenarios in this research were designed so that local users consistently made presentations and remote users listened to the presentations. This design intentionally includes in the scenarios the premise that the most explicit and important aspect of the robot’s gaze is the possibility that it introduces a conflict between the semi-autonomous system and the remote user.
However, if the remote user is a presenter, we think that there is less need for the remote operation to be reflected in the robot more reliably than in the scenario addressed in this research. The rationale is because, in the scenario of this research, the remote user needed to obtain specific visual information such as flip boards, but when the remote user is a presenter, the robot moves are richer than the robot moves that are intended by the remote user.
Therefore, we should investigate the effect of the automation of the two types of movements in the other scenarios/contexts.
We analyzed the relationship between the movements of a semi-autonomous telepresence robot and human impressions. To investigate the effect of automation of voluntary and involuntary movements, we developed the telepresence robot that can automatically express voluntary and involuntary movements to accept the control of remote users.
We first carried out role-play scenarios assuming a small meeting session and presentations under several conditions and then analyzed the videos recorded in two different ways.
As a result, from the local user’s perspective, we can ascertain that automating voluntary and involuntary movements can provide a good impression depending on the quality of the implementation. From the remote user’s perspective, it was suggested that automating voluntary movements could produce a bad impression due to collisions between the autonomous voluntary movement and remote operation.
As future work, we will consider a method to resolve the collisions by adaptation to remote users. Additionally, we will investigate the effect of the automation in an experiment with a scenario that is more general than that presented in this paper.
Adalgeirsson SO, Breazeal C (2010) Mebot: a robotic platform for socially embodied presence. In: Proceedings of the international conference on human–robot interaction, IEEE Press, Piscataway, pp 15–22
Coltin B, Biswas J, Pomerleau D, Veloso M (2011) Effective semi-autonomous telepresence. In: Robot Soccer world cup. Springer, pp 365–376
Dahlbäck N, Jönsson A, Ahrenberg L (1993) Wizard of oz studies-why and how. Knowl Based Syst 6(4):258–266
De Greef P, Ijsselsteijn WA (2001) Social presence in a home tele-application. CyberPsychol Behav 4(2):307–315
Dow S, MacIntyre B, Lee J, Oezbek C, Bolter JD, Gandy M (2005) Wizard of oz support throughout an iterative design process. IEEE Pervasive Comput 4(4):18–26
Fernando CL, Furukawa M, Kurogi T, Kamuro S, Minamizawa K, Tachi S, et al (2012) Design of telesar v for transferring bodily consciousness in telexistence. In: International conference on intelligent robots and systems. IEEE, pp 5112–5118
Fish RS, Kraut RE, Chalfonte BL (1990) The videowindow system in informal communication. In: Proceedings of the conference on Computer-supported cooperative work. ACM, pp 1–11
Fürler L, Nagrath V, Malik AS, Meriaudeau F (2013) An auto-operated telepresence system for the nao humanoid robot. In: International conference on communication systems and network technologies (CSNT). IEEE, pp 262–267
Hasegawa K, Nakauchi Y (2014) Unconscious gestures that empower turn taking for telepresence robot. Trans Jpn Soc Mech Eng 80(819):DR0321 (in japanese)
Hayamizu A, Imai M, Nakamura K, Nakadai K (2014) Volume adaptation and visualization by modeling the volume level in noisy environments for telepresence system. In: Proceedings of the second international conference on Human-agent interaction. ACM, pp 67–74
Imai M (2013) The power of a socially embedded robot. J Robot Soc Jpn 31(9):864–867 in Japanese
Imai M, Ono T, Ishiguro H (2003) Physical relation and expression: joint attention for human–robot interaction. IEEE Trans Ind Electron 50(4):636–643
Isaacs EA, Tang JC (1994) What video can and cannot do for collaboration: a case study. Multimed Syst 2(2):63–73
Jancke G, Venolia GD, Grudin J, Cadiz JJ, Gupta A (2001) Linking public spaces: technical and social issues. In: Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, pp 530–537
Jeannerod M (2003) The mechanism of self-recognition in humans. Behav Brain Res 142(1–2):1–15
Kashiwabara T, Osawa H, Shinozawa K, Imai M (2012) Teroos: a wearable avatar to enhance joint activities. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 2001–2004
Koizumi S, Kanda T, Shiomi M, Ishiguro H, Hagita N (2006) Preliminary field trial for teleoperated communication robots. In: The 15th IEEE international symposium on robot and human interactive communication (ROMAN). IEEE, pp 145–150
Kristoffersson A, Coradeschi S, Loutfi A (2013) A review of mobile robotic telepresence. Adv Hum Comput Interact 2013:3
Lu JM, Lu C, Chen Y, Wang J, Hsu Y, et al (2011) Tricmini—a telepresence robot towards enriched quality of life of the elderly. In: Proceedings of the Asia Pacific eCare and TeleCare congress
Matsui D, Minato T, MacDorman KF, Ishiguro H (2005) Generating natural motion in an android by mapping human motion. In: International conference on intelligent robots and systems. IEEE, pp 3301–3308
Matsumura R, Shiomi M, Nakagawa K, Shinozawa K, Miyashita T (2016) A desktop-sized communication robot:“robovie-mr2”. J Robot Mechatron 28(1):107–108
Nakamichi D, Nishio S (2016) Effect of agency to teleoperated communication robot by semi-autonomous nod. Trans. Jpn Soc Artif Intell 31(2): (in Japanese)
Nakanishi H, Murakami Y, Kato K (2009) Movable cameras enhance social telepresence in media spaces. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp 433–442
Norman DA, Ortony A (2003) Designers and users: two perspectives on emotion and design. In: Proceedings of of the symposium on foundations of interaction design at the interaction design institute, Ivrea, Italy
Ogata M, Teramura R, Imai M (2015) Attractive telepresence communication with movable and touchable display robot. In: 2015 24th IEEE international symposium on robot and human interactive communication (RO-MAN). IEEE, pp 179–184
Ogawa K, Nishio S, Koda K, Balistreri G, Watanabe T, Ishiguro H (2011) Exploring the natural reaction of young and aged person with telenoid in a real world. JACIII 15(5):592–597
Paulos E, Canny J (1998a) Designing personal tele-embodiment. In: Proceedings of the IEEE international conference on robotics and automation. IEEE, vol 4, pp 3173–3178
Paulos E, Canny J (1998b) Prop: personal roving presence. In: Proceedings of the SIGCHI conference on Human factors in computing systems. ACM Press/Addison-Wesley Publishing Co., pp 296–303
Paulos E, Canny J (2001) Social tele-embodiment: understanding presence. Auton Robots 11(1):87–95
Sakamoto D, Kanda T, Ono T, Ishiguro H, Hagita N (2007) Android as a telecommunication medium with a human-like presence. In: Proceedings of the international conference on Human-robot interaction. IEEE, pp 193–200
Shiomi M, Sakamoto D, Kanda T, Ishi CT, Ishiguro H, Hagita N (2008) A semi-autonomous communication robot—a field trial at a train station. In: 3rd ACM/IEEE international conference on human-robot interaction (HRI). IEEE, pp 303–310
Takimoto Y, Hasegawa K, Sono T, Imai M (2017) A simple bi-layered architecture to enhance the liveness of a robot. In: International conference on intelligent robots and systems (IROS). IEEE
Tanaka K, Yamashita N, Nakanishi H, Ishiguro H (2016) Teleoperated or autonomous?: How to produce a robot operator’s pseudo presence in HRI. In: Proceedings of the international conference on Human-robot interaction. IEEE Press, pp 133–140
Tobita H, Maruyama S, Kuzi T (2011) Floating avatar: telepresence system using blimps for communication and entertainment. In: Extended abstracts on human factors in computing systems. ACM, pp 541–550
Tonin L, Leeb R, Tavella M, Perdikis S, Millán JdR (2010) The role of shared-control in BCI-based telepresence. In: 2010 IEEE international conference on systems man and cybernetics (SMC). IEEE, pp 1462–1466
Tonin L, Carlson T, Leeb R, Millán JdR (2011) Brain-controlled telepresence robot by motor-disabled people. In: 2011 annual international conference of the IEEE engineering in medicine and biology society, EMBC. IEEE, pp 4227–4230
Tsui KM, Desai M, Yanco HA, Uhlik C (2011) Exploring use cases for telepresence robots. In: Proceedings of the international conference on Human–robot interaction. ACM, pp 11–18
Tsui KM, Norton A, Brooks DJ, McCann E, Medvedev MS, Allspaw J, Suksawat S, Dalphond JM, Lunderville M, Yanco HA (2014) Iterative design of a semi-autonomous social telepresence robot research platform: a chronology. Intell Serv Robot 7(2):103–119
Yamazaki R, Nishio S, Ogawa K, Ishiguro H, Matsumura K, Koda K, Fujinami T (2012) How does telenoid affect the communication between children in classroom setting? In: Extended abstracts on human factors in computing systems. ACM, pp 351–366
This work was supported in part by JSPS KAKENHI (Grant No. 17J00580) and supported in part by MEXT KAKENHI (Grant No. 26118006).
Conflict of interest
The authors declare that they have no conflict of interest.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Osawa, M., Okuoka, K., Takimoto, Y. et al. Is Automation Appropriate? Semi-autonomous Telepresence Architecture Focusing on Voluntary and Involuntary Movements. Int J of Soc Robotics (2020). https://doi.org/10.1007/s12369-020-00620-5
- Telepresence robot