1 Introduction

Social robots have been developed and are presently being used in our daily lives. These social robots are deployed in the service industry for various purposes. For instance, social service robots are used as museum guides [1], travel guides [2], shopping guides [3], and for hotel services [4]. Currently, significant advancements in machine learning have improved the performance of robots; however, there are limitations for the tasks that robots can perform. In the long-term perspective, however, robots will gradually improve in performance and become more necessary in our lives. As robots become more widespread in our lives, they are expected to become a labor support technology in society [5] and provide a new type of customer service as avatar robots [6].

Among the various tasks in the service industry, tasks of providing information and presenting advertisements in commercial facilities are expected to be one of the roles that robots can play [7]. The robot can directly approach a target customer more efficiently by using its embodiment, as opposed to the conventional method of providing information through papers or digital signages. For example, the effects of some applications of robots wherein they are used for a variety of services have been validated such as consumers ordering more food through communicating with robots [8]. Moreover, stakeholders, which includes customers, shop managers, and mall managers, are positive about introducing robots in shopping malls [9, 10]. Consequently, introducing robots into commercial facilities is not limited to advertising and providing information, but they are also highly expected to be a part of our future. Despite these expectations, it has been reported that social robots in the real world tend to be neglected by users even if the robots talk to them, owing to the limitations of their abilities [11, 12]. To maximize the effectiveness of robots, it is necessary to not only improve their capabilities but also to identify the types of behavior they should display.

Fig. 1
figure 1

One of the experimental scenes in a shopping mall. The autonomous robot attempts to convey information about the mall to pedestrians

In this study, therefore, we investigated robot behaviors in terms of providing information and advertising to pedestrians in a shopping mall. The setup of the experiment is presented in Fig. 1. In particular, the detailed aims of this study are twofold:

  1. 1.

    This study explores versatile robot behaviors that do not depend on providing specific types of information or advertisements. In particular, this study proposes and verifies the types of robot behaviors rather than detailed behavioral verifications, such as eye-contact and talk timing.

  2. 2.

    This study verifies how well the robot works in the real world by comparing the performance with humans that give a baseline.

We believe that these aims will lead to a discussion on how robot behaviors can be applied in general.

In this study, there are three main steps in which robots successfully provide information. These steps are drawing the attention of pedestrians to the robot, making pedestrians stop in front of the robot, and continuing the engagement until the message is delivered. In these tasks, we believe that getting pedestrians to stop and conveying the information to the end are the most difficult tasks. In order to provide an opportunity for the robot to perform well for these two tasks, we proposed three types of robot behaviors and perform experiments on the robots with these behaviors in a shopping mall. Based on these results, we discuss what kind of robot behaviors can trigger people to want to communicate with robots. We also examined whether the proposed robot system was more effective than human advertisers in terms of providing information. For robots to support human labor, they are expected to demonstrate performance equivalent to that of humans. Thus, the comparison results between the performance of robots and humans are very useful for stakeholders to consider deploying the robot into a real environment. Therefore, four human advertisers were gathered, and the experiment was conducted under the same environmental conditions as that of the robot. The human advertisers are simply given the same tasks as the robots, which is providing pedestrians with information about the store, and they perform the tasks with some minor constraints. This allows Experiment II to provide a baseline for comparison with the robot performance.

The paper is structured as follows. In Sect. 2, related works are described. The experimental methodologies and results are described in Sects. 3 and 4, respectively. In Sect. 5, the discussion based on the results in the two experiments is provided. Finally, Sect. 6 provides the conclusions and describes future work.

The preliminary research for this study was presented at a conference and published in the proceedings [13] in which we reported our limited results of the pedestrians affected by the robot behaviors. The current paper provides a detailed analysis depending on the biological sex, age, and characteristics of the pedestrians. In addition, the current paper provides a comparison of the performance results between the robots and the humans. The comparative results are crucial for the discussion section as an insight into the competence of robots and the diffusion of robots in society. Based on the added results, the introduction, related works, discussion, and conclusion sections are also refined.

2 Related Works

2.1 Service Robot in Real-World Environment

There are many examples of robots being used to provide information and display advertisements in stores. One of the aims of this research in real-world environments is to build the robot system itself. For instance, experiments on a semi-autonomous robot are supported by human operators for dialogues. These robots have been placed at shopping malls and stations [7, 14], and a multiple robot system has been implemented in a shopping mall [15]. These studies aimed to verify the effectiveness of the system itself, i.e., whether the robot system can be used in the real-world environment, rather than to verify the effects of the robot’s detailed behavior.

Several previous studies focused on comparing and verifying the detailed behaviors of the robots that are effectively used by users in real environments. An example of a robot using its mobility is a robot handing out flyers [16, 17]. The behavior of humans distributing flyers was analyzed, and the analyzed optimal behavior was implemented into a robot and then tested in a shopping mall. The robots in the study applied their mobility capabilities to the maximum effect in the task of distributing flyers.

In contrast, stationary robots (or robots that rarely move) have difficulty in terms of providing information because their actions are limited because they cannot approach pedestrians themselves. These robots need to attract pedestrians and draw them near the robot through their presence and actions. After engaging with pedestrians, a study in a museum [18] explored the timing of the robot’s head and gaze action to increase a pedestrian’s engagement. In the shopping mall, robot behaviors that are natural for humans have also been validated when humans approach robots [19]. With respect to the behavior of the stationary robot before engaging pedestrians, the social presence of robots is important; thus, the talking behavior of a robot has been shown to be effective [20]. In addition, a looking-back behavior is also effective for gaining pedestrians’ attention [21].

A more detailed analysis shows that there are five elements of robot behaviors that are important in terms of engaging with users: eye contact, duration of eye contact, distance to users, approaching users, and laughing. Shiomi et al. investigated different robot sizes and different conversational schemes for providing information to examine detailed robot behavior [22]. This study revealed that smaller robots that provide specific information have a high success rate in terms of delivering information. Hayashi et al. validated the effectiveness of providing information in terms of the number of robots and the conversation methods [23]. The results suggest that a passive-social medium, where two robots talk to each other and they can provide information to pedestrians indirectly, is better than an interactive-social medium with the users. There is also a study of virtual agents that can change their behavior that is based on the spatial characteristics of humans to enable robots to engage with humans more effectively [24].

Previous studies of stationary robots have often focused on developing robots that can gain the attention or engagement of pedestrians by implementing and comparing detailed robot behaviors. However, although specific behaviors such as looking-back behavior tend to encourage pedestrian engagement, other robots may be unable to implement them due to functional limitations. Therefore, to implement common behaviors to more robots, it is important to propose a type of robot behavior.

2.2 Performance Comparison between Robots and Humans

When developing a service robot to operate in a real environment, most research aims to be able to accomplish tasks such as providing information and distributing flyers. However, if robots are used as a labor support technology, it is important to compare the performance of robots and humans, and a few studies aimed at them.

As one of the studies that directly compared task performance, an android robot as a salesperson attempted to sell goods at a department store [25]. This study shows that the android robot is able to sell goods as well as human selling goods performance. Another study compared a teleoperated robot with a human in the task of distributing food samples [26]. By distributing food samples while the robot passively approaches pedestrians, the robot achieves high performance comparable to humans. These differences in performance have been shown to be influenced by the unique characteristics of the robot. Factors such as the eeriness of the robot can cause discomfort to the consumer, and as a result, compensatory consumer response such as ordering more food is facilitated [8]. In other words, the high task performance of robots is not only due to their high ability, but also due to a variety of other factors.

There are also some studies that have investigated how users feel when robots (and virtual agents) and humans perform the same task, although they are not directly compared in task performance. A comparison of tasks performed by a virtual agent and a human in service encounters shows no difference in terms of service satisfaction [27]. Until recent years, users have shown that human services are preferred over robot services, but due to the influence of COVID-19, robot services have also been shown to be particularly preferred in recent years [28]. As in these studies, we can develop more valuable robots by not only showing that robots can accomplish tasks but also by comparing their performance with that of humans.

3 Methodology

The aim of this study is to propose the types of robot behaviors that perform in providing information tasks and to verify the effectiveness of these behaviors in the real environment. Comparing the proposed robot behaviors is useful for finding the best behavior for that task; however, the results of the best behavior are based on the relative results in comparison to the other types of robot behaviors. In other words, it cannot be argued whether robots can actually be introduced as a role to support humans in an actual environment. Therefore, by comparing the proposed robot behavior with human performance, we show the possibility for robots to play an active role in the real world.

Therefore, in this study, we conducted two experiments. In Experiment I, we proposed three types of robot behaviors and verified their effectiveness. In Experiment II, human advertisers performed the same tasks as robots in the same environment as Experiment I and compared their performance with the results of Experiment I.

3.1 Experiment I: With an Autonomous Robot

Experiment I aims to investigate whether the humanoid robot can make pedestrians stop and maintain engagement until they have delivered their intended message. To achieve this, three types of robot behaviors were designed and compared.

For our investigations, we conducted an exploratory field experiment in a large shopping mallFootnote 1 during July–August 2019. The humanoid robot was present for three weekdays and weekends and was available for 6 hours a day. The robot was placed in one of the shopping mall’s corridors so that visitors, such as families, couples, and friends, could freely interact with the robot. We announced to all the pedestrians through a notification board that this was an experiment, and a video was being recorded along with the sensor data. This study was conducted on an opt-out basis for unwilling participants who wanted to be removed from the video and sensor data. The opt-out process may have changed the pedestrians’ behavior, such as the pedestrians who attempted to interact with the robot, but quit the interaction owing to the notification. However, no one asked to delete the record in the experiment; thus, the effect of opt-out on the experimental results is expected to be minimal.

This experiment was approved by the facility authorities in the shopping mall and the Research Ethics Committee from Ritsumeikan University (Reference number: BKC-HitoI-2019-006).

3.1.1 System Configuration

Fig. 2
figure 2

Interaction system. The robot and printer were installed on the desk, and the five RGB-D sensors were set behind the robot

Fig. 3
figure 3

Combined image of the five RealSense cameras giving a total 220\(^\circ \) FOV

We built an interaction system containing a humanoid social robot, five RGB-D image sensors, and a printer on a table, as shown in Fig. 2. The robot “Sota” that was developed by Vstone Co. Ltd. was used as the social robot in this experiment. The humanoid robot is approximately 0.3 m tall and has functions such as voice and LED-generated facial expressions. The robot includes arms with two degrees of freedom (DOF), a head with three DOF, and body gestures with one DOF. Although the robot was equipped with an RGB camera on its head, we used additional cameras owing to the field of view (FOV) of the camera equipped on the robot, resulting in the limitations for these experiments. In terms of the 3D image sensors, we used five Intel RealSense D415 sensors, which can capture a RGB image (FOV per one camera: 69.4\(^\circ \) \(\times \) 42.5\(^\circ \) \(\times \) 77\(^\circ \)) and depth image (FOV per one camera: 65\(^\circ \) \(\times \) 40\(^\circ \) \(\times \) 72\(^\circ \)); the maximum range of the depth sensor was 10 m. A combined image of five cameras is shown in Fig. 3, and the horizontal FOV of the five sensors that were used in this study was 220\(^\circ \). In addition, the printer was installed next to the robot to print a voucher for people who finished the interaction with the robot. The voucher could be exchanged for a bottle of water.

To generate robot behaviors according to the behavior of the pedestrians, we used the human detector “NUITRACK,” which can estimate the posture of the human in the image [29]. From the results of NUITRACK and the depth image, we calculated the 3D coordinates of all the pedestrians that were observed from the robot’s coordination. We used a computer (Intel Core i9-9900K CPU, NVIDIA GeForce RTX 2080 Ti GPU) to obtain the human posture of all pedestrians at a rate of 30 frames per second (fps). The limitation area to estimate the human posture was approximately within 4 m from the sensors.

3.1.2 Interaction Design

The robot had three types of robot behaviors when pedestrians were near the robot: “greeting behavior”, “in-trouble behavior”, and “dancing behavior”. This study aims to verify the type of robot behavior, rather than comparing the detailed behavior. While exploring the types of robot behaviors, we conducted a pre-experiment wherein we remotely controlled the robot installed in front of a convenience store at the university and performed a variety of behaviors to attempt to make pedestrians stop. The pedestrians are university students and faculty members unrelated to this study, and the operator was tried by all authors. This situation is similar to the situation in Experiment I. From the results in the pre-experiment, we determined that there are three main behaviors that many people attempt during the task: greeting, in-trouble, and dancing behaviors.

During the greeting behavior, many operators that controlled the robot attempted to directly start a dialogue to the pedestrians such as “Hello! How are you doing?” or “Where are you going to?”. Then, the conversation starts with the pedestrian replying to the robot. In contrast, some operators attempted other ways of dialogue initiated from the pedestrian rather than making the robot initiate a dialogue. For instance, the robot keeps muttering to itself “I’m in trouble” or dancing. The robot does not start the dialogue with the pedestrian until the pedestrian starts talking to the robot. It is important for the robot to generate an opportunity that makes pedestrians want to talk to the robot. Instead of starting a conversation when the pedestrian responds to the robot, the robot should make pedestrians feel that they have spoken to the robot. Thus, we used these in-trouble and dancing behaviors, that demonstrated good results to attract pedestrians in the pre-experiment, for the experiment in the shopping mall.

This experiment here aimed to compare three types of robot behaviors as the manipulative factors, rather than the differences in the detailed movements of the robots. We verified the influence of the type of robot behavior by measuring the behavior between participants at the same experimental location on different dates. Therefore, only a single behavior was presented each day, and each behavior was assigned two days (1 weekday and 1 weekend).

3.1.3 Procedure

The common robot motion for the three behaviors is face-to-face contact with the pedestrian that is closest to the robot so as to clarify whom the robot targets. Face-to-face contact has proven to be effective in human–robot interaction from various aspects, such as conveying the robot’s attention [30]. For the greeting behavior, the robot makes hand raising gestures to the pedestrians and says “Hello! Please talk with me!” During the in-trouble state, the robot behaves as if it has a headache and keeps muttering to itself for seeking someone’s help “I’m in trouble. What should I do?” This creates an opportunity for the pedestrian to want to talk to the robot by pretending that the robot is in trouble. During the dancing mode, the robot dances while saying “I’m dancing! Let’s play with me!” This also creates an opportunity for the pedestrian to want to talk to the robot for fun.

These behaviors were performed while the posture for at least one pedestrian was measured (maximum range to obtain the postures was 4 \(\times \) 4 m as the lateral and depth direction) until the pedestrian stopped in front of the robot. The robot determined if the pedestrian stopped by examining whether or not the pedestrian stayed in the area of 1.2 \(\times \) 2.5 m (lateral and depth direction) from the sensors for 3 s. After the pedestrian stopped in front of the robot, the robot started talking about a store in the shopping mall for 13, 19, or 26 s, depending on the scenario. At the beginning of every talk scenario, the robot says “At (store name), you can buy a very delicious (recommend item name).” This talk scenario was generated randomly regardless of the three types of behaviors. Before introducing the store information, the robot says “Can you listen to me? Thank you!” In this way, we designed the robot story to feel natural by inserting a connecting statement between the remarks to stop the pedestrian and the scenario of the store information. While the robot was talking, the interaction system with the passersby was not performed, which is called a passive medium [23]. When the pedestrian finished listening to the robot, the printer next to the robot printed a voucher. The pedestrian could exchange the voucher for a bottle of water in the store that the robot was advertising.

These series of robot behaviors were all performed automatically. As explained above, the robot behavior is not determined by verbal interaction with the pedestrian. This is because the environment of the commercial facility was noisy, and the robot system could not accurately recognize what the pedestrians talk to the robot. Therefore, the robot behavior was automatically generated by estimating the pedestrian’s state based only on the posture data. Each behavior in motion can be watched from a video that was presented in the preliminary study (YouTube Link: [31]).

Fig. 4
figure 4

Paths of all the labels that were annotated from the recorded video. The red and blue paths represent the stop rate (SR) and the distribution success rate (DSR). The path through red-yellow-blue denotes the whole distribution success rate (WDSR)

3.1.4 Measurement

Throughout the experiment, we recorded videos as shown in Fig. 3 to analyze the behavior of all the pedestrians. Thereafter, we labeled all the pedestrians that walked in front of the robot front, the pedestrians that stopped in front of the robot, and the pedestrians who received the voucher. Along with counting the people that stopped in front of the robot, we labeled pedestrians that stopped while excluding situations where there were only small children. The reason for excluding small children-only situations from the “pedestrian stopped” was that the human detector NUITRACK cannot detect small children due to the occlusion by the robot and the desk. Thus, the proposed robot system was not able to proceed to the information provision phase even if the small children were in front of the robot. On the other hand, in situations where there are tall children or where small children and adults are together, the system can proceed to the next phase. In this situation, children are also evaluated for the rate of receiving vouchers. The number of people received was labeled depending on the talk scenarios (A: 13 s, B: 19 s, and C: 26 s). The detailed path of all labels is shown in Fig. 4.

The features extracted from the recorded videos include the labeled behavior, apparent biological sex of the pedestrian(s), and estimated age (child under 12 years old or an adult). This annotation was applied to all the pedestrians who passed by the robot in the video. If the same person appeared more than once a day, each appearance was annotated without personal identification. To ensure valid results, video data annotation was performed by two coders. One was the author, Y. O., and the other was a person unrelated to this study, who was hired as a part-time worker. In order to ensure uniformity in the criteria while judging “stop” behaviors in front of the robot, the two coders worked together to determine the criteria before annotating. The data for one day were overlapped, and the analysis of the overlapped data showed that they were well matched (Cohen’s Kappa was .894).

We used three indexes for the evaluation: the stop rate (SR), distribution success rate (DSR), and whole distribution success rate (WDSR). The SR is the ratio of the number of pedestrians that stopped to the number of all pedestrians (red path in Fig. 4). The DSR is the ratio of the number of pedestrians who received the voucher to the pedestrians that stopped, while excluding only children (blue path in Fig. 4). The WDSR is the ratio of the number of pedestrians who received the voucher to the number of all pedestrians (path through red-yellow-blue in Fig. 4).

3.1.5 Hypothesis

The greeting behavior was considered a basis because this is often used to get the attention of pedestrians (e.g., [32]). On the other hand, previous studies have demonstrated that emotional robots [33] and human-dependent robots [34], which are similar to the idea of in-trouble, can have a higher engagement with users. In addition, in a collaborative task between humans and robots, the robot’s request for help increases people’s behavioral willingness [35]. Therefore, we can expect that in-trouble also demonstrates a higher performance (SR, DSR, and WDSR) in this experiment. In addition, the high performance (SR, DSR, and WDSR) by the dancing behavior is also expected because dancing for rhythmic interaction was performed with higher engagement with children [36].

3.2 Experiment II: By Humans

Experiment I aims to identify robot behaviors that can make pedestrians stop and maintain engagement with them. Comparing the three types of robot behaviors is useful for finding the best behavior; however, the results of the best behavior are based on the relative results in comparison to the two other types of robot behaviors. Therefore, we cannot argue that these robot behaviors are more competent than human behaviors. If robots are to be used as a labor support technology, a better performance by robots than humans is desirable. In addition, clarifying the tasks wherein robots are superior to humans provides invaluable insights in collaborating with humans. Several studies have been conducted to examine whether robots can play an active role in the real world by comparing the performance of robots and humans (e.g. [25]). Thus, to compare the results that are generated by the robots and those from humans, we conducted a second experiment in which the robot was replaced by a person in an equivalent experimental environment.

The experiment was conducted in the same location as Experiment I in November 2019. With the same situation as Experiment I, the experiment was recorded with a notification board and conducted on an opt-out basis.

This experiment was approved by the facility authorities in the shopping mall and the Research Ethics Committee from Ritsumeikan University (Reference number: BKC-HitoI-2019-006-1).

3.2.1 Human Advertisers

Four people who have experience in distributing flyers were recruited through a temporary agency to participate in the experiment (two males/females, average age: 23.75 years old). The experiment was conducted over 4 days, and each human advertiser performed the task each day. All the human advertisers provided informed consent, which allowed for the use of the collected data for scientific purposes and publication. The human advertisers received 9,500 JPY a day, and they also received a reward in accordance with their performance.

3.2.2 Interaction Design

The human advertisers were instructed to encourage the pedestrians to stop to provide information about the shop that is the same purpose as the robot task in Experiment I. The advertisers were allowed to use the leaflet of the shop, which is different from Experiment I. This task requires the human advertisers to perform their usual way of providing information. As one of the goals in Experiment II was to compare the robot’s performance with the human’s usual performance, human behavior should be as unrestricted as possible. After the pedestrians were stopped by the advertiser’s attempt, the advertiser must provide them with information about the shop, such as recommended products. It was forbidden to annoy the pedestrians to force them to stop, such as interfering with their walking or saying that water is being distributed. When the pedestrian finished listening to the information about the shop, the advertiser gave a voucher that could be exchanged for a bottle of water.

As a standard for the area in which the advertiser could move during the experiment, an area of 1.2 \(\times \) 2.5 m (lateral and depth direction) was specified, which is the same area for determining the pedestrians to stop in Experiment I. Thus, the significant difference between Experiments I and II was whether to use the leaflet, and other situations were set up to be similar. However, we treated Experiments I and II as separate experiments because the conditions were not completely consistent.

The human advertisers were allowed to practice the experimental task for approximately 20 min before the experiment. The advertisers performed three sets of 50 min of executing the experimental task followed by 10 min of rest. To provide motivation to the advertisers, an additional reward of 500 JPY for every 10 vouchers was given to the advertisers. The scene during Experiment II is shown in Fig. 5.

Fig. 5
figure 5

Example scene in Experiment II with a human advertiser conveying information to pedestrians

3.2.3 Measurement

We labeled some indices from the recorded video, which are similar to the labels in Experiment I, as shown in Fig. 4. While the robot system in Experiment I could not recognize the situation with only small children, “Pedestrians stopped excluding children-only situations” was not labeled because the situation where human advertisers cannot recognize small children does not occur in Experiment II. In other words, all pedestrians including small children are evaluated. The time during which a pedestrian stops in front of the advertiser was also annotated, instead of the paths of each scenario. This annotation was applied to all the pedestrians who passed in front of the robot in the video. To ensure valid results, video data annotation was also performed by two coders. One was the author, Y. O., and the other was a person unrelated to this study. The data for 1 day overlapped, and the analysis of the overlapped data indicates that they were well matched (Cohen’s Kappa was .917).

We used the same three indicators (SR, DSR, and WDSR) for the performance evaluation as in Experiment I. By comparing the performance of robots and humans using these indicators, we are able to discuss whether robots can be introduced in the real environment.

3.2.4 Hypothesis

In some studies comparing the performances between the robots and humans, the performance of robots is comparable to that of humans [25, 26]. Therefore, in this study as well, we assume that robots will show a comprehensive performance (WDSR) that is comparable to humans in providing information tasks. We also assume that the SR of the robot is higher than that of humans as robots are more interesting than humans, and the DSR of the robot is lower than that of humans as the robots cannot interactively communicate with passersby.

4 Results

4.1 Results of Experiment I

Table 1 Total number of pedestrians that walked in front of the robot, pedestrians who stopped in front of the robot, pedestrians stopped while excluding only children, and pedestrians who received the voucher
Fig. 6
figure 6

Results of the stop rate (SR), distribution success rate (DSR), and the whole distribution success rate (WDSR) according to each robot behavior in Experiment I

The results for the number of labeled pedestrians are shown in Table 1. The results for the SR, DSR, and WDSR according to each robot behavior are shown in Fig. 6. We verified the differences in the number of pedestrians among the behavior conditions with a Chi-squared test. We used the Cramer’s V as the effect size in all the Chi-square test results. The results revealed significant differences among the behavioral conditions: \((\chi ^2(2) = 333.64, p < 0.01, V = 0.05)\) in the SR, \((\chi ^2(2) = 81.61, p < 0.01, V = 0.10)\) in the DSR, and \((\chi ^2(2) = 252.14, p < 0.01, V = 0.04)\) in the WDSR. The residual analysis in comparison with the mean across all behaviors showed that (1) the greeting had low SR and WDSR ratios, (2) the in-trouble state had high SR, DSR, and WDSR ratios, and (3) dancing had a high SR ratio but low DSR and WDSR ratios. Therefore, our results show that the robot motion that behaves as though it is in trouble makes pedestrians stop more and stay longer in front of the robot in comparison to the greeting and dancing behaviors. Meanwhile, the dancing behavior resulted in a significant number of pedestrians to stop, but the stopped pedestrians did not listen to the robot talk for a long time.

Fig. 7
figure 7

Results of the stop rate (SR) and the distribution success rate (DSR) according to the biological sex in Experiment I

Fig. 8
figure 8

Results of the stop rate (SR) and the distribution success rate (DSR) according to the age in Experiment I

As described in the detailed analysis, the additional results of the biological sex difference in the SR and DSR, age difference in the SR and DSR, and scenario difference in the DSR are shown in Figs. 7, 8, 9. We also verified the differences in the number of people for the biological sex in the SR and DSR, age in the SR and DSR, and scenario in DSR through a Chi-square test. The results among the biological sex revealed significant differences in the SR \((\chi ^2(1) = 21.50, p < 0.01, V = 0.02)\) and the DSR \((\chi ^2(1) = 5.26, p = 0.02, V = 0.04)\). The results among the age showed significant differences in the SR \((\chi ^2(1) = 2666.67, p < 0.01, V = 0.20)\), but no significant differences in the DSR \((\chi ^2(1) = 1.10, p = 0.29, V = 0.02)\). As a result, this indicates that females and children are more likely to stop in front of the robot. However, there were no biological sex or age differences in whether they listened to the message to the end. In terms of the talk scenario difference (Scenario A: 13 s, Scenario B: 19 s, Scenario C: 26 s), there was a significant difference in the DSR \((\chi ^2(2) = 77.37, p < 0.01, V = 0.12)\). In addition, the findings revealed that the pedestrians listen to the robot talk in its entirety as the scenario becomes shorter.

Next, there were differences in pedestrian behavior in the hourly results; Fig. 10 shows the SR and DSR results for each hour. As the start time of the experiment was different for each condition, we validated the average results for all conditions from 12–4 pm, which is the common time window when the experiment was conducted, with a Chi-square test. In terms of the SR and DSR, the average results for each time window (12–4 pm) were (6.2, 7.8, 7.7, 8.8,  and \(8.4 \%)\) and (31.0, 28.2, 35.7, 36.9,  and \(37.2\%)\), respectively. The results among the time revealed significant differences in the SR \((\chi ^2(4) = 65.76, p < 0.01, V = 0.02)\) and the DSR \((\chi ^2(4) = 21.28, p < 0.01 , V = 0.04)\). The residual analysis in comparison with the mean across all times showed that (1) 12 pm had low SR ratio, (2) 1 pm had low DSR ratio, and (3) 3 pm and 4 pm had high SR and DSR ratios. Therefore, these results indicate that it is difficult for robots to approach pedestrians during the hours close to lunchtime.

Finally, we show the ratio of the number of times between one pedestrian and the group that stopped in front of the robot and listened to its message until the end, as shown in Table 2. The results revealed that the differences are small for the behavioral conditions in terms of the pedestrians stopping rates and the received pedestrians. In addition, it was determined that pedestrians in a group stop in front of the robot and listen to its entire message in comparison to those who are alone.

4.2 Results of Experiment II

The results of the number of labeled pedestrians are shown in Table 3. The results of each SR, DSR, and WDSR according to each advertiser are shown in Fig. 11. The results show that there are large individual differences in all the SR, DSR, and WDSR evaluations. Advertiser 3 had the lowest SR, and Advertiser 2 had the lowest DSR. In terms of the WDSR as the total index to provide information, Advertiser 4 was able to hand out the most vouchers to the pedestrians. In an additional analysis, we measured how long the pedestrians listened to the advertiser’s message when the pedestrian got the voucher, as shown in Fig. 12. Advertiser 4 required the least amount of time to provide information before handing out the vouchers, with an average time of less than 20 s. These results indicate that the shorter the time to provide the information, the better the DSR.

Fig. 9
figure 9

Results of the distribution success rate (DSR) according to each scenario in Experiment I

Next, to examine the aims of Experiment II, we compared the results that were generated by the robot behaviors and that of the human advertisers. In terms of the SR, the average results of the robots for the weekday and weekend were 5.7, 10.6, and 7.7 % for the greeting, in-trouble, and dancing behaviors, respectively. In contrast, the results by each advertiser were 3.5, 3.5, 0.8, and 4.1 %, respectively. All the results in the SR by the robot outperform the human performance, and the statistical results with the Chi-squared test with the Cramer’s V compared the total values of the robots to the advertisers also displayed significant differences: \((\chi ^2(1) = 636.84, p < 0.01, V = 0.08)\). In other words, the robots can perform the stopping task easier.

With the DSR, the average results of the robots for the weekday and weekend were 33.9, 42.0, and 26.0 % for the greeting, in-trouble, and dancing behaviors, respectively. In contrast, the results by each advertiser were 60.9, 43.9, 51.4, and 86.2 %. All the results in the DSR by the advertisers outperform the robot performance. The results of comparing the total values of the robots and the advertisers also show significant differences: \((\chi ^2(1) = 158.51, p < 0.01, V = 0.19)\). This is the opposite result of the SR. We also measured the length of the stop time by randomly sampling approximately 20% of the pedestrians who stopped in front of the robot in Experiment I. The results show that the average length of stop time for all robot behaviors was 21.5 s, whereas the average length of the stop time for all human advertisers was 22.0 s. We verified the differences in these results with a non-paired t-test, and the result showed no significant differences: \((t(678) = 0.25, p = 0.80, d = 0.02)\). This indicates that, although the length of stop time is the same for robots and human advertisers, the DSR is higher for the latter.

With the WDSR as the total index in providing the information, the average results by the robots during the weekday and weekend were 1.6, 3.6, and 1.6 % for the greeting, in-trouble, and dancing behaviors, respectively. In contrast, the results by each advertiser were 2.1, 1.5, 0.4, and 3.5 %. The results of the greeting and dancing behaviors are comparable to the average human performance. On the other hand, in terms of the in-trouble behavior, the robot has a higher performance than Advertiser 4, who demonstrated the best performance among the advertisers. The results in comparing the total values of the robots and the advertisers indicate significant differences: \((\chi ^2(1) = 17.14, p < 0.01, V = 0.01)\). Consequently, these results show that the robots are able to perform similar to humans or even better.

Finally, we show the ratio of the number of times between one pedestrian and the group that stopped in front of the human advertiser and listened to their message, as shown in Table 4. The results revealed that the differences between the strategies of each human advertiser are large. Advertisers 1 and 2 had a large percentage of successful stops and distributions for one pedestrian because they likely tend to talk to a large number of individual pedestrian. Meanwhile, Advertisers 3 and 4 had a large percentage of stops and successful distributions for groups. These results in Experiment II differed from Experiment I, indicating that the results are independent of the robot behaviors.

Fig. 10
figure 10

Results of the stop rate (SR) and the distribution success rate (DSR) according to the time window in Experiment I

Table 2 Ratio of the number of times between one pedestrian and the group that stopped in front of the robot and listened to the robot’s message until the end

5 Discussions

5.1 Discussion on Experiment I

This study mainly aims to investigate whether the humanoid robot can make pedestrians stop in front of it and maintain engagement with them. Thus, we designed three types of behaviors: greeting, in-trouble, and dancing behavior.

5.1.1 Comparison of three types of robot behaviors

The results in Experiment I show that when the robot exhibited the in-trouble behavior, it had the best performance among the proposed behaviors in terms of attracting pedestrians and providing information. This result is consistent with the hypothesis that in-trouble behavior shows high performance in all indicators. In addition, the dancing behavior also showed a similar high performance in comparison to in-trouble only to make people stop. This result is consistent with the hypothesis that dancing behavior showed high performance in SR, but not in agreement with the hypothesis that DSR and WDSR are high.

Table 3 Total number of pedestrians, pedestrians who stopped, and pedestrians who received the voucher
Fig. 11
figure 11

Results of the stop rate (SR), distribution success rate (DSR), and the whole distribution success rate (WDSR) according to each human advertiser in Experiment II

The difference in these results is probably due to the different reasons why the pedestrians approached the robots. When humans see the weak or human-dependent robots, they tend to increase their engagement with the robots because humans want to help robots [34]. This is similar to a phenomenon wherein an adult reaches out to a child when the child is in trouble. Therefore, it seems that many pedestrians were willing to listen to the robot in the in-trouble situation because the in-trouble behavior is a similar case. In contrast, in the dancing behavior, we assume that the most common reason that pedestrians approached the robot is for fun. However, despite that they approached the robot to enjoy its dance, the pedestrians felt disconnected when the robot started talking about the shop’s information after they approached it. Thus, it can be assumed that the DSR was the lowest among all the proposed behaviors owing to this gap.

Consequently, these results suggest that in-trouble and dancing behaviors are effective to make the pedestrians stop, when compared to greeting behavior. They are designed to generate an opportunity that makes pedestrians want to talk to the robot. Instead of starting a conversation when the pedestrian responds to the robot, the robot makes pedestrians feel that they have spoken to the robot. In this way, we consider that taking the form of starting a conversation triggered by the user’s behavior will help improve the performance of the robot. In addition, even in such an effective type of robot behavior, the consistency in the robot’s behavior is crucial in maintaining its engagement with the pedestrians.

Fig. 12
figure 12

Results of the duration when successfully making the pedestrians stop during in Experiment II. The error bars represent the standard error of the mean

Table 4 Ratio of the number of times between one pedestrian and the group that stopped in front of the human advertiser and listened to the their message

5.1.2 Comparison of biological sex and age differences

Other interesting results are the biological sex and age differences in the SR and DSR. The results of the SR and DSR showed that women, as well as children, mostly stop in front of the robot and listen when the robot talks. The same situation has also occurred in other studies [19], wherein children, sometimes accompanied by their parents, often interact with the robots. In addition, the results of the previous studies indicated that men have a more positive attitude toward interacting with robots than women [37, 38]. However, other studies that use the robot “Sota,” which is the same as used in the present study, [39] showed that women are more interested in the robot; this finding is consistent with our results. Therefore, the results suggest that biological sex difference in terms of the interest in the robots does not give a definite decision, but it may depend on the appearance of the robot. Meanwhile, the results of the DSR did not show a significant difference by the age. This robot was a non-interactive robot; that is, it did not have the ability to communicate with humans through dialogues. In this case, we assume that the results of the DSR represent the pedestrian’s time where they lost interest in the robot and it was similar regardless of the age. In other words, even if the robots are interacting with a person who is strongly interested in the robot, a robot with a poor interactive ability will quickly get boring.

5.1.3 Comparison of pedestrian group sizes

Finally, it was determined that, when pedestrians are in a group, they are more likely to stop and are more willing to listen to the full message. Several examples of these observations have been reported in other studies [17, 24, 26, 39]. These observations were thought to be caused by another person who draws the pedestrian to interact with the robot, and not by the robot itself [17, 24]. Therefore, in this study, the reason for this observation can be assumed to be that if one person in the group is interested in the robot, the others should listen to the robot; thus, they will wait for him and/or her. Therefore, in crowded situations, such as shopping malls, it is more efficient if the robots talk to the group to provide the information than to an individual person.

5.2 Discussion on Experiment II

5.2.1 Comparison of performances between robot and human

We also compared the results by the robots and the humans in Experiment II. When comparing both their performance, the SR was higher for the robots and the DSR was higher for the humans. This result is consistent with the hypothesis. The reason for the low DSR can be assumed to be that the robot’s verbal interaction ability was significantly lower than that of humans. It was not possible to implement verbal interaction in the robotic system because the robots do not exhibit a proper speech recognition in a noisy commercial environment. In future studies, this problem can be solved by developing a technology that can recognize the speech of pedestrians correctly even in noisy environments.

In contrast, there are several possible reasons why, from the SR perspective, the robots can perform better than humans. The first is simply because pedestrians were strongly interested in the robot’s behavior. The greeting behavior, which is the lowest SR among the robot behaviors, is able to improve the SR in comparison to humans owing to its novelty effects [40]. In addition, other behaviors were able to improve the SR drastically. Thus, the different types of robot behaviors can attract pedestrians. The second was that the pedestrians did not prefer the situation wherein the adult’s human advertisers were calling out to them to provide information. As mentioned in the discussion on Experiment I, an adult tends to reach out to a child when the child is in trouble. On the other hand, in the case of Japan, where the experiment was conducted, when a strange adult talks to a person in a shopping mall or in a town, the stranger is often a salesperson who recommends something. This is considered nuisance to the pedestrians. Therefore, pedestrians avoid conversations with strangers before they know the type of information will be provided. Accordingly, we believe that the SR of the adult advertisers was lower; however, the SR could be higher if a child advertiser did the same task (if it were ethically possible).

The results of the WDSR for the greeting and dancing behaviors of the robots are comparable to human performance; however, the in-trouble behavior performance exceeded all the results of the human advertisers. This result is partially inconsistent with the hypothesis that the performance of robots is comparable to that of humans. However, this result shows the positive possibility that robots can play an active role in the real world. While the SR and DSR are part of the process in evaluating the performance of an informational task, the WDSR is the final evaluation of performance. In other words, in this experiment, the robot succeeded in providing more information to the pedestrians than the human advertisers. Therefore, the results may indicate that robots are more effective in the information provision task.

5.2.2 Robot Advantages

These results may be useful in the future collaborative design of robots to support humans. For example, by having a remote avatar robot system such as in [6, 14], it is possible to build a more high-performance system by integrating the capabilities of robots with those of humans. In this study, the environment was a noisy shopping mall in which it is difficult to recognize the speech of a particular individual. Thus, we constructed a passive medium system in which the robot talks unilaterally. However, when it comes to interacting with pedestrians, humans can interpolate the dialogue to improve the interactivity. The autonomous robot attempts to make pedestrians stop in front of the robot, and humans are interpolated in situations where it is difficult for the autonomous robot to talk with the pedestrians. Thus, we expect to build a system that can demonstrate a high performance by interpolating the weaknesses of the robots and humans.

In addition, this result does not consider the decrease in human performance over time. In this experiment, the advertisers were asked to perform the tasks for 3 h, and performance degradation during this time was not measured. However, if they perform the task over a longer period, performance degradation due to fatigue occurs. In that case, the robots can deliver better results than humans in the work environment.

In Experiment II, we observed that the SR and WDSR of Advertiser 3 were lower than those of the other advertisers. This is because human advertiser 3 approached pedestrians less often than the other advertisers (Number of people passing per approach for each advertiser: 4.6, 5.6, 46.9, and 10.9, respectively). On the other hand, this phenomenon does not occur with robots; thus, robots have the advantage in that their performance is not affected by factors such as human individual differences. In summary, the results from this study suggest that robots can be sufficient as a labor support technology, which is one of the goals of robotic research.

5.3 Limitations

Finally, we want to present the limitations of this study. First, we compared the performances of the three types of robot behaviors. However, it is unclear whether two types of robots that are implemented with the same type of behavior but have slightly different details of motion can achieve similar results. This is a limitation that we need to explore in future studies, which also considers the difference in the robot’s appearance and degrees of freedom.

In addition, this study cannot demonstrate whether the proposed robot behavior always shows the same results. In this study, we conducted the experiment with the robot in a commercial facility where many people have relatively more time to spare. However, through the experiments, we found that most of the approaches from the robot failed for people in a hurry. Therefore, depending on environmental conditions such as the context and location where the robot is installed, we cannot guarantee if the in-trouble behavior has a significant effect on people as the results of this study.

Next, our results strongly depend on the novelty effect. In the in-trouble behavior, the robot said “I’m in trouble” to attract pedestrians’ attention, and then they conveyed information about the store. This may have been a type of “crying wolf.” Further, in the field of HRI, for example, it has been reported that robot errors decrease people’s trust in the robot [41]. In other words, when people are exposed to the in-trouble behavior more than once, their trust in the robot may decrease, and they may not listen to the robot. In such a situation, the robot may not be able to surpass human performance even with SR that has shown superior performance over humans. This long-term performance will be considered for future studies. In addition, it is possible to design unethical interactions that deceive humans by applying the in-trouble behavior. The unethical interactions reduce confidence in the robot as a whole, which can hinder the spread of robots. Therefore, we need to be careful about how to utilize the in-trouble behavior.

We should also consider that the results of this study are highly dependent on cultural differences. Previous studies on human-robot interaction with several cultural differences have shown that people with different cultures behave differently depending on the task and the appearance of the robot [42, 43]. For example, for people who think robots are mechanical rather than humans, a robot’s in-trouble behavior may seem creepy. In this case, in-trouble behavior could deliver the worst result. In human–human interaction, we showed that pedestrians in Japan tend to avoid talking with strangers. However, in other cultures, robots may not be able to outperform humans in terms of the SR results. During this experiment, we did not interview any pedestrians who interacted with the robot. As we cannot infer how the pedestrians felt through their interaction with the robot, we cannot have a rigorous discussion on the cultural differences that affected them. Therefore, the cultural difference is another limitation of this study.

From a perspective closer to cultural differences, the degree to which people are accustomed to robots affects the results of this study. In today’s society, social robots are still a rarity and an intriguing object. However, we believe that in a future society where a variety of social robots are prevalent, people care less about what a robot does even if it dances or behaves as it is in trouble. Therefore, we need to consider that, in the future, results may be different from those obtained in this study.

6 Conclusion

This study investigated whether a humanoid robot can make pedestrians stop in front of it and listen to its message. We proposed three types of robot behavior: greeting, in-trouble, and dancing behaviors. The robot with each behavior was placed in a shopping mall, and the effectiveness of the robots for providing information was verified.

The results from the exploratory field experiments revealed that the in-trouble behavior, that is, the robot behaves as if it is in trouble, can make pedestrians stop more and stay longer in front of the robot. These results were compared to the results achieved by four humans under the same situation, in which they attempted to make the pedestrians stop to provide information. The comparative results show (1) the performance of the robots was higher than that of the humans in the stop rate (SR), and (2) in the distribution success rate (DSR), the human performance was better than the robots’ performance. In particular, in terms of the whole distribution success rate (WDSR), the performance obtained using the greeting and dancing behaviors of the robots are comparable to the human performance. Furthermore, it was determined that the performance of the in-trouble behavior was higher than those of all the human advertisers who participated in this experiment. These findings demonstrate that the performance of robots is not inferior to that of humans in providing information tasks. Therefore, it is expected that service robots are able to perform well in the real world. In other words, the results of this study suggest that robots can be sufficient as a labor support technology, which is one of the goals of robotic research.

This study, however, has some limitations because it is difficult for the robots to interact naturally with pedestrians in a noisy environment. This is because automatic dialogue generation is difficult due to the low accuracy of speech recognition for certain pedestrians in noisy environments. These problems are common to all robots that operate in real environments. Hence, ensuring that the robot can recognize the speech content of only the target person in a noisy environment is required. However, by interpolating the weaknesses of the robots and humans, we can build an integrated robot system that can demonstrate high performance. By achieving this, we believe that it is important for robot designs to compensate for the weaknesses of robots and humans in the future.