Introduction

The research and development of robots that play with humans have been conducted since the late 1980s [1,2,3]. Soccer is a popular target of the game for robots, because of the RoboCup soccer project [4]. Since the beginning of the project, many robots that play soccer have been developed [5,6,7,8].

The robots that play with children, especially the ones that play physical games such as sports, need to move around humans to achieve specific tasks. Developing such robots is a large challenge and includes ensuring safety for both human and robot, human-robot communication, understanding of human behavior, etc.

We have developed a robot that plays the Darumasan ga koronda game with children [9, 10]. Darumasan ga koronda is a Japanese children’s traditional game. Several research works have treated this game as a subject of human-computer interaction [11, 12] or health care [13,14,15,16]. There are two roles in this game: the players and “it”, where the players try to tag “it” while “it” is counting to ten.

Our final goal is to develop a robot that plays Darumasan ga koronda, acting in the two roles mentioned above, with children. In the previous work, we have developed methods for realizing a robot that plays the game as “it”, tracking the players’ movements and determining whether a player is moving when the player should freeze [9]. We applied a human detection based on a laser range finder (LRF) for a human-following method [17] to track the players [10]. This method is robust when a person to be tracked is walking normally. However, in the game, a player sometimes walks in a distinct posture, which results in a failure to detect the human bodies. Moreover, from the viewpoint of “it”, the players often overlap, which makes it difficult to sense the positions and movements of the individual players.

In this paper, we propose two new methods to detect and track players robustly using an LRF. In “The game and the previous works” section, we briefly describe the rules of the game, and we also review the previous works on person detection and tracking. In “Analysis and improvement of person detection” section, we propose an improvement over the person detection method using an LRF. In “Person tracking” section, we propose a multiple-person tracking method that can track players even when the robot loses some of the players’ positions. “Experiment” section describes two experiments to confirm the effectiveness of the proposed player detection and tracking methods. In “Evaluation of the total system” section, we describe the total system that can actually play with players and the results of the experimental games are reported. Finally, we conclude the paper in “Conclusion” section.

The game and the previous works

The “Darumasan ga koronda” game

First, we briefly introduce the game and define the rules of the game [9, 18]. Darumasan ga koronda is a Japanese game that is similar to Red light, green light or Statues as played by children in the United States or the United Kingdom. The game is played by several players and one “it”. First, “it” stands in a field, and the players stand at the far end. Then, “it” turns to the other end (with its back facing the players) and counts to ten. While counting, the players get nearer to “it” to tag. After counting, “it” turns around and looks at the players; at that time, each player must freeze. Otherwise, the player is caught by “it” and makes a chain with “it” by holding hands. When a player tags “it”, the player wins, all the caught players are released, and the game starts again from the beginning. If all players are caught, “it” wins, and another player becomes the next “it.”

Figure 1 shows illustrations of each state of the game.

  1. 1.

    When the game starts, “it” (the red person in Fig. 1) and the other players stand in the distance.

  2. 2.

    “It” faces the back and counts to ten by saying “da-ru-ma-sa-n-ga-ko-ro-n-da” (those words have ten syllables). When “it” has its back to the players and is counting, the players approach “it” to tag.

  3. 3.

    After counting, “it” faces the players. At this time all of the players must freeze. When “it” finds a player moving, “it” calls that player “out.”

  4. 4.

    The player who is called “out” is caught by “it” and makes a chain by holding hands with “it.” If all players are caught, “it” wins, and the game ends.

  5. 5.

    If a player tags “it”, the caught players are released.

  6. 6.

    When the players are released, “it” says “stop” to stop the players.

  7. 7.

    After stopping the players, “it” moves three steps to the nearest player. If “it” tags the player, the player is called “out” and caught by “it.” Then, the game starts again from the beginning.

We have implemented and evaluated the basic framework for a robot to play Darumasan ga koronda. Considering safety and ease of evaluation, we have slightly changed the rules of the game. Here are the differences from the ordinary rules:

  • The robot’s role is fixed as “it”, and the robot does not become a player even when it catches all of the players.

  • The robot does not turn around when counting.

  • The players are not caught by the robot even when they are called “out.”

  • After the robot is tagged, the robot calls “stop,” and then moves to the nearest player by a distance of three steps of a typical human.

  • When moving to the nearest player, the robot does not tag the player and instead calls “out” if it approaches within 0.5 m of the player.

The flowchart of the behavior of the robot is shown in Fig. 2.

Fig. 1
figure 1

State of the game

Fig. 2
figure 2

Flowchart of the robot behavior in the game

To realize a robot that plays the game, we need to develop methods that achieve the following six tasks.

  1. (A)

    Detection of persons.

  2. (B)

    Tracking of multiple persons.

  3. (C)

    Calculation of the motion of a person.

  4. (D)

    Detection and tracking of the nearest person.

  5. (E)

    Moving to the nearest person.

  6. (F)

    Speech synthesis.

Among the six tasks, person detection and tracking play a central role. Thus, we review previous works on person detection, and describe a basic method of person detection for which we propose improvements in this paper.

Conventional person detection methods and their problems

A number of methods have been proposed for the detection and tracking of persons. The sensors are installed either on the robot [19, 20] or in the environment [21]; cameras [19, 21], laser range finders (LRF) [17, 19] or radars [22] are used as sensors.

Considering the content of the game, it is difficult to install the sensors in the environment since the game is played outdoors. Thus, the sensors need to be attached to the robot. Next, there are several choices of sensors, such as cameras, LRFs, rader or ultrasound sensors. The situation of the game requires that (1) the sensor should be robust against lighting conditions because the game is played outdoors, (2) the sensor should be able to measure the distance to players several meters away, and (3) the frame rate of the measurement should be short (less than 0.1 s/frame) so that moving players can be captured. Considering requirement (1), a camera is not suitable for this purpose under various weather conditions. Requirement (2) excludes the ultrasound sensor because it cannot measure an object several meters away. Finally, requirement (3) can be satisfied by any of the camera, LRF and radar; however, camera-based methods, such as stereo camera, require computationally expensive processing. Based on these considerations, we chose an LRF as the sensor.

A number of works have been proposed that use LRFs for person detection. Horiuchi et al. proposed a method to detect human legs using a single LRF for detection and tracking of persons by a small robot [23]. Similar methods that observes a person’s legs were also proposed by Sung and Chung [24], Chung et al. [25], Aguirre et al. [26], and Leigh et al. [27]. Carballo et al. developed a method that uses two LRFs to measure the shoulders and legs of a person [28]. The method proposed by Luo et al. [29] combined measurements from a single camera and an LRF that observes the legs of persons. Hoshino and Morioka [30] proposed a method that combines measurements from an LRF and a Kinect sensor.

These works can be classified into two types: methods that uses a single LRF and measures a person’s legs [23,24,25,26,27] and methods that combines measurements from multiple sensors, including multiple LRFs [28,29,30].

Considering that most robots are much smaller than humans, it is reasonable to capture a human’s legs to estimate his/her position. When estimating a person’s position from the positions of his/her legs, a model of legs is assumed. For example, [27] assumes that two legs are always separately observed by the LRF and that each leg can be tracked using a Kalman filter. Other methods, such as [24], exploit a model of the walking pattern of humans to estimate a human’s position robustly. However, as described in the next section, players of the game have a wide variety of postures compared with walking persons, as shown in Fig. 5. Thus, it is difficult to precisely estimate humans from the positions of their legs. Thus, observing players’ waists seems to be more robust than observing their legs. Moreover, we can estimate the direction of the body by observing the waist. Using only one LRF is advantageous over the methods that use multiple sensors from the perspective of a robot’s cost.

Conventional person tracking methods and their problems

Multiple-person tracking is another important task for realizing the playing robot. Here, we review person tracking methods based on observation of persons using an LRF. Most works that detect persons using an LRF also perform person tracking [23, 25,26,27]. These works perform either simply observe people’s positions contiguously [23, 24] or track their positions using a statistical method, such as a Kalman filter [27] or particle filter [26]. These works assume that the persons to be tracked can be continuously observed. However, when playing the game, the players overlap each other from the viewpoint of “it.” This problem can be solved if we install multiple sensors in the environment [21]; however, this solution is difficult to use for our purposes because we want to play the game in any playground. Thus, we need to develop a multiple-person tracking method that can cope with overlapping players.

Person detection using a laser range finder

We chose a person detection method that uses an LRF installed on the robot to observe the waist positions of the persons [17]. The advantage of using an LRF is that it is robust against the lighting condition [31], which is crucial for a robot that moves outdoors. Moreover, observing players’ waists is more robust than observing their legs because it is not affected by unusual gaits and postures.

We have developed a system to achieve the abovementioned tasks. The details of the calculation of motion (item C, for “out” judgment) were described in our previous paper [9]. We first describe the algorithm briefly.

The LRF installed on the robot measures the distance from the LRF to the obstacle at a specific angle counterclockwise. The LRF observes the human body at the waist position (1.0 m higher than the floor level). Compared with methods that observe a persons at the shoulders [19] or legs [24], observing the waist can detect the person more robustly. Figure 3 shows the measurement of distances using an LRF. The triangle in the figure indicates the LRF, and the circles represent the measured points. The LRF measures the rightmost point first and then measures the distance from right to left step by step. Let \(D(\theta )\) be the distance from the LRF to the obstacle at the angle step \(\theta \). Then we calculate the difference in distances at every angle.

$$\begin{aligned} \Delta D(\theta ) = D\left( \theta \right) - D\left( \theta - 1 \right) \end{aligned}$$
(1)

As shown in Fig. 3, \(|\Delta D( \theta )|\) is large at the boundary of the object (red points) and small within the object (blue points). Therefore, we can determine the boundary of the object by choosing the angle \(\theta \) where \(|\Delta D( \theta )|\) exceeds a pre-defined threshold \(D_{\text {th}}\). An observed point at angle \(\theta \) is determined as the rightmost point of an object when

$$\begin{aligned} \Delta D\left( \theta \right) < - D_{\text {th}}, \end{aligned}$$
(2)

and as the leftmost point when

$$\begin{aligned} \Delta D\left( \theta + 1 \right) > D_{\text {th}}. \end{aligned}$$
(3)

After detecting the objects, we classify the objects into persons and non-persons based on their widths. Figure 4 shows the classification. The center point (red point) is the middle point of the leftmost and rightmost points. If the width (distance between the leftmost and rightmost points) of an object is similar to the typical width of a human body, that object is classified as a human. When two or more human-like objects are found in neighboring areas, the leftmost object is chosen as the candidate for the human body.

Fig. 3
figure 3

Segmentation of the measurement data [17]

Fig. 4
figure 4

Detection of human body

Many human detection methods utilize machine-learning-based classifiers for the final decision of human detection [32, 33]. Compared with those methods, the decision in our method is quite simple and deterministic. Although our method is not optimal from a statistical point of view, the advantage of our system is that it does not require any classifier optimization when installed in a new environment.

Analysis and improvement of person detection

The person detection algorithm explained in the previous section worked robustly when the persons to be tracked were walking normally. However, when we applied this method to the players when playing the game, the algorithm often failed to detect the players or misdetected a player’s arm as the human body. The biggest reason seemed to be that the posture of the persons was different from that of persons when walking normally.

Observation of players’ postures in human–human games

To investigate the postures of the players, we conducted an experiment to observe the actual postures of players. We asked five participants (4 players and one “it”) to play Darumasan ga koronda in a large area (24 m by 12 m, indoors), and recorded the game using three cameras. The game was played six times.

As a result of observing the recorded video, we found that there were four patterns of player posture while moving. Examples of the postures are shown in Fig. 5. The posture in Fig. 5a is a moving form with arms swinging at 30°, which was observed during the usual walk. Figure 5b is a moving form with bending of the elbows at 90°, observed when a player was trotting. Figure 5c shows that a player swings his/her widely apart from his/her body. This posture was observed when the player was running. Figure 5d shows that a player puts his/her hands down, swinging his/her entire body, while moving. When moving with this form, a player often puts his/her hands on his/her thighs.

Fig. 5
figure 5

Observed postures of players when moving

We found two major patterns where detection of the human body fails. When a player moves with a posture such as that shown in Fig. 5c, the left and right arms are detected as individual objects, as shown in Fig. 6a. In this case, when the arms are recognized as large objects, they are sometimes misclassified as human bodies. Because the leftmost object is chosen as the candidate for the body, the right arm is recognized as the body, which results in the error of the position of the person’s center point. Another case occurs when the player’s posture is similar to that shown in Fig. 5a. In this case, one or two arms overlap with the body, which causes a failure in the detection of the body boundary. As shown in Fig. 6a, when the right arm overlaps the body, the leftmost point of the body (at angle \(\theta \), shown in the red line) is not detected, because \(\Delta D(\theta +1)<0\) (the point at \(\theta +1\) is nearer than that at \(\theta \)); thus, as shown in Fig. 6b, when the right arm overlaps the body, \(\Delta D(\theta +1)\) becomes negative (the point at \(\theta +1\) is nearer than that at \(\theta \)), and the condition shown in formula (3) is not satisfied. Thus, the leftmost point of the body (at angle \(\theta \), shown in the red line) is not detected. If \(\Delta D(\theta +1)\) is small enough to satisfy formula (2), the rightmost point of the arm (the green point in Fig. 6b) can be detected.

Thus, we need to develop a player detection method to detect the bodies of the players when the arms are separately detected as objects, as well as a method to detect the body boundary when the arms overlap the body.

Robust detection of the boundary points of the body

The observation of human-human play revealed that we need to focus on improving the processing methods of (A) and (B) described in “The “Darumasan ga koronda” game” section. Specifically, we need to develop the following two methods: detection of a player’s body when the arms are separately detected as objects, and determination of the player’s body when the arms overlap the body.

First, we solve the problem shown in Fig. 6b. As shown in Fig. 7a, when the right arm overlaps the body, detection of the body boundary points fails. Thus, we apply the rules to determine the “temporal endpoint.”

Fig. 6
figure 6

Problems of person detection

Fig. 7
figure 7

Determination of temporary endpoint

Let us define \(P\left( \theta \right) \) as the surface point of angle step \(\theta \), and let the predicates RM(P) and LM(P) denote that a point P is the rightmost or leftmost point of a certain object, respectively. Let a predicate OBJ(P) denote that P is a point on the surface of an object. Then, the rules are described as follows.

  1. 1.

    If \(RM\left( P\left( \theta \right) \right) \) and \(OBJ(P\left( \theta - 1 \right) )\), then \(P(\theta - 1)\) becomes the temporal endpoint (the leftmost point of the object).

  2. 2.

    If \(LM(P\left( \theta \right) )\) and \(OBJ(P\left( \theta + 1 \right) )\), then \(P(\theta + 1)\) becomes the temporal endpoint (the rightmost point of the object).

An example of applying these rules is shown in Fig. 7b. Here, the point just right of the rightmost point of the right arm is determined as the temporal endpoint. By determining the temporal endpoint, we can calculate the width and center of the body.

Determination of the best candidate for the human body

The problem shown in Fig. 6a occurs when the left and right arms are determined to be bodies. We can avoid this problem using the fact that the body is usually wider than the arms. Therefore, when two or more candidates of human bodies are detected, we compare the widths of the objects and choose the widest object as the final candidate for the body. Figure 8 shows an example of the method. In this example, three objects are detected, and their widths are \(W_1, W_2\) and \(W_3\). Then, we compare the widths of the objects, and the widest one (in this example, the center object with width \(W_2\)) is chosen.

Fig. 8
figure 8

Determination of the body

Person tracking

In this section, we propose a multiple-player tracking method. As explained in section 2.3, the conventional person tracking methods do not cope with the occlusion of players, which is inevitable for the playing robot. Thus, we developed a simple method for tracking the players who have become lost by being hidden by other players. In this section, we first describe the basic person detection method and then describe a method to track the lost players again.

The basic person tracking method

As explained in the previous section, the person detection is performed for each LRF scan. After detecting the human bodies, we need to track the players and ignore those who are not the players. To do this, we employ a simple method for multiple-person tracking.

At first, we assume that there are only players in the pre-defined playing area. Figure 9 shows the initial positions of the players. We define a rectangular area (shown as red rectangles in the figure) around the position of a player. Let the position of a player p at time t be \((x_p(t),y_p(t))\). Then the area of the player p is defined as

$$\begin{aligned} S_p(t) =\; & \{(x,y): x_p(t)-T_{x1} \le x \le x_p(t)+T_{x2} \text { and }\nonumber \\& y_p(t)-T_{y1} \le y \le y_p(t)+T_{y2}\} \end{aligned}$$
(4)

Here, \(T_{x1}, T_{x2}, T_{y1}\) and \(T_{y2}\) are the thresholds. When the position of a player q at time \(t+1\) is obtained as \((x_q(t+1),y_q(t+1)) \in S_p(t)\), we regard player q as identical to player p in the previous time t. In the experiment described later, we set \(T_{x1}=T_{x2}=300\) mm, \(T_{y1}=600\) mm and \(T_{y2}=300\) mm. \(T_{ y1}\) was set to be larger than \(T_{y2}\) considering that a player is moving toward the robot.

Fig. 9
figure 9

Tracking three persons

Tracking of lost players

This tracking algorithm works when the robot does not lose the person to be tracked. However, it fails when a player is hidden by another player. In a case shown in Fig. 10, player A goes behind player B, and thus player A is not detected (and not tracked).

Fig. 10
figure 10

An example where tracking fails when a player goes behind another player

In this case, the previous position of player A should be kept in the tracking system. When the system loses player A at time t, the position is assumed to be the same as the position at time \(t-1\).

$$\begin{aligned} \left(x_A(t),y_A(t)\right) = \left(x_A(t-1),y_A(t-1)\right) \end{aligned}$$
(5)

If a person is detected at time \(t^\prime >t\) and enters into \(S_A(t^\prime -1)\), then the person is regarded as player A and tracked again. Figure 11 shows the tracking algorithm. Here, d(pq) is the distance between the two persons, i.e.,

$$\begin{aligned} d(p,q)=\sqrt{(x_p - x_q)^2+(y_p-y_q)^2} \end{aligned}$$
(6)

Figure 12 is another example for which the tracking fails. In this case, the lost player A becomes visible again, but player A’s new position is outside of \(S_A(t-1)\), and thus player A is not identified as the same player as the one who was observed before. To address this case, we introduce the following rule:

If a player is known to be lost and another person exists near the last position where the lost person was observed, the newly observed person is regarded as the lost person.

When a player p is lost, his/her position does not move at all,

$$\begin{aligned} (x_p(t-1),y_p(t-1))=(x_p(t),y_p(t)). \end{aligned}$$
(7)

Thus, we use this condition to determine whether the player is lost or not. Figure 13 shows the revised algorithm. In this algorithm, the detected persons are associated with the players based on the threshold, and the lost players are searched for among those who are not associated with any players.

Fig. 11
figure 11

The person tracking algorithm 1

Fig. 12
figure 12

An example of tracking failure after the re-detection of the player A

Fig. 13
figure 13

The person tracking algorithm 2

Note that this algorithm works when only the players are in the area observed by the LRF; otherwise, this algorithm incorrectly identifies non-players as the players.

Experiment

We conducted three experiments. The first experiment is to examine if the proposed human detection method can address the various postures shown in Fig. 5. The second experiment is to test the tracking performance when players overlap each other. The last one is the total experiment of playing Darumasan ga Koronda.

The robots used in the experiment

Figure 14 shows the robot system used in the experiment. The mobile robot is the same as that used in the previous work [9]. Table 1 shows the specifications of the mobile robot, and Table 2 shows those of the LRF installed on the robot. The LRF is installed at a height of 1000 mm on the robot. To play Darumasan ga koronda, the robot (playing the role of “it”) first needs to face the players and then turn around. However, it is not desirable for a large robot to turn around from a safety point of view. Therefore, we installed a robot avatar [34] on the mobile robot base. The robot avatar is a small robot for communication that can make gestures. By letting the robot avatar turn and make gestures, we can make the robot safer.

Fig. 14
figure 14

The mobile robot (Carry PM3)

Table 1 Specifications of the mobile robot
Table 2 Specifications of the LRF(URG-04LX-UG01)

“out”, the robot avatar points to the player and utters “out” using a speech synthesizer. Figure 15 shows the robot avatar. Figure 15a is the front view of the avatar, and Fig. 15b shows the degrees of freedom of the avatar. When the robot calls “out”, the robot avatar points to the player and utters “out” using a speech synthesizer.

Fig. 15
figure 15

The robot avatar

Experiment I

We first investigated the effect of the proposed methods for person detection. Here, we denote the method proposed in “Robust detection of the boundary points of the body” section as “method A” (determination of the endpoint when an arm overlaps the body) and that in “Determination of the best candidate for the human body” section as “method B” (selection of the best candidate for the human body).

Figure 16 shows the environment of the experiment. We prepared three walking paths (A-B, C-D, E-F). We marked every 900 mm on the path so that the participant stepped on the marks when walking. The participant was recorded using two video cameras from different angles, as shown in Fig. 16.

Fig. 16
figure 16

Experimental environment

A participant walked each of the three paths ten times, five times of which were analyzed using method A, and the other five times were not (thus 30 times in total). The walking speed was 1.8 m/s. We calculated the body center coordinates with and without method B for both data.

Figure 17 shows examples of the experimental results. We tested four conditions (with/without method A and with/without method B). Figure 17a shows the results without method A, and Fig. 17b shows those with method A. The tick marks of “\(\times \)” show the results without method B, and the “\(+\)” marks show those with method B. At most points, those marks overlap and look like “✳”. In the results of Fig. 17a, the target person was lost at approximately \(Y=2000\) mm with and without method B. It could be improved using method A (Fig. 17b). Moreover, looking at Fig. 17b, method A without method B (“\(\times \)” marks) misdetected the arm as a body (“Moved point” in the figure), which was improved by applying method B (“\(+\)” marks).

Fig. 17
figure 17

Trajectory of the center points of the target person

The misdetection rates are shown in Table 3. These values are the ratios of misdetected or lost points of the body center. In this experiment, the participant moved with a fixed velocity (1.8 m/s), and the LRF measured the position of the participant ten times a second. Thus, we assumed that the velocity of the participant was exactly constant and compared the assumed center point and the measured point. When the assumed and measured points differed by more than 120 mm, we regarded the point as misdetected. Figure 18 shows the measurement and decision of misdetection. From Table 3, it is obvious that the misdetection rate can be reduced drastically by applying methods A and B. For the three paths (A-B, C-D, E-F) we did not observe any significant difference in the misdetection rates.

Table 3 Misdetection rates of person detection for different paths (%)
Fig. 18
figure 18

Decision of misdetection of a measured person’s position

In addition, we investigated the effect of walking speed. In this experiment, only the A-B path was used, and the participant walked at three speeds (0.7 m/s, 1.4 m/s, and 1.8 m/s), three times for each speed condition. In this experiment, method A was always applied, and we compared the results with and without method B. The result is shown in Table 4. Before applying method B, detection errors were observed at rates of 1.4 to 4.8 %, while no misdetection occurred when using method B.

Table 4 Misdetection rates of person detection for different speeds (%)

Next, we investigated the effect of walking form. As shown in Fig. 5, there are four types of walking forms when playing the game. Thus, we asked a participant to walk in the four walking forms shown in Fig. 19.

One participant walked in three forms (as shown in Fig. 19a–c) at 1.8 m/s and in the form of Fig. 19d at 0.7 m/s. The reason why the velocity of form (d) was different is that it was difficult to move quickly in form (d). The trial for each form was iterated three times.

Fig. 19
figure 19

Four forms examined in the experiment

In this experiment, the proposed method A was always applied, and we compared the results with and without method B. Examples of the experimental results are shown in Fig. 20. We can see that applying method B improved the estimated trajectory for forms (a), (b) and (c). For form (d), because the result without method B was good enough, applying method B did not improve the result.

Fig. 20
figure 20

Estimated trajectories for different walking forms

Table 5 shows the total results. Although we observed 0.69% detection errors for Form (c) even when using method B, we could reduce the misdetection of the players using method B. We did not observe any detection errors for form (d).

Table 5 Misdetection rates of person detection for different forms (%)

Experiment II

In the next experiment, three players moved according to various patterns, and we tested whether the system could track all of the players even when a player was hidden by other players. Figure 21 shows the environment in which the experiment was carried out. As shown in the figure, there were twenty-one position marks in the playing field. Three players moved on the marks synchronously. We prepared twenty motion patterns, as shown in Table 6. At first, players A, B and C stood at the positions 1, 3 and 5, respectively. Then the players moved according to the patterns. The patterns shown in Table 6 were designed so that at least one player was hidden by another player or two or more players stood at the same position to cause misdetection of the players. In Table 6, the bold letters indicate that the players at that position were hidden by other players, and the italic letters indicates that two or more players stood at the same position.

Fig. 21
figure 21

Experimental environment

Table 6 Motion patterns

As a result, all players were correctly detected and tracked at the final position of all patterns. As examples, images of the players, as well as screenshots of the player detection system for patterns 8 and 20 are shown in Figs. 22 and 23, respectively. The video clips of those two motions are provided as Additional files 1 and 2.

Fig. 22
figure 22

Result of Pattern 8

Fig. 23
figure 23

Result of Pattern 20

In Fig. 22, player B moved forward, and player C moved behind player B (Fig. 22b, c). At this time, the system lost player C, which can be confirmed in Fig. 22f in which only the center point (the pink circle) is displayed in the green rectangle and the body was not detected. As player C stepped aside (Fig. 22h, i), the system found player C and tracked him again (Fig. 22l).

In Fig. 23, the three players stood at the same position (Fig. 23c) and moved forward (Fig. 23h). At this time, the system could not recognize the players as individual persons (Fig. 23k). When players A and C stepped aside (Fig. 23i), the system detected all of the players again (Fig. 23l).

Evaluation of the total system

Finally, we investigated the total system by actually playing the game (see the attached video). Two players participated in the game together. The environment of the game was the same as that shown in Fig. 16. The game was recorded by two video cameras (Video 1 and Video 3). The two players started from the positions A and E, respectively. Five sets of play were examined, with each set ending when either a player tagged the robot or the two players were called “out”. When a player tagged the robot, the robot called “stop” and tried to get nearer to the nearest player. If the robot could arrive within 4.095 m of the nearest player (the typical length of three steps by a human player [9]), “it” won; otherwise, the players won.

As a result, “da-ru-ma-sa-n-ga-ko-ro-n-da” was called 18 times (3.6 times/set). Both players were not called “out” in all sets. In the first and second sets, “it” won by moving to the nearest player. In the third and fourth sets, the nearest player moved outside of the observation range of the LRF (5.6 m), and the players won. In the fifth set, the players won by moving outside of the robot’s moving area (4.095 m).

We confirmed that the robot avatar properly pointed to the appropriate player when calling “out”. In the experiment, the robot could only judge those players who were within the observation range of the LRF. Further judgment will be made available by switching the LRF with one with a longer measurement range (such as the UTM-30LX).

Figure 24 shows pictures of the playing experiment recorded by two videos for the fourth set. The new system could detect and track the players even when the players raised their arms, a situation in which the previous system failed to track the players’ bodies.

Fig. 24
figure 24

Evaluation experiment of the total system

Conclusion

In this paper, we proposed methods to improve the player detection and tracking based on an LRF to realize a robot that plays a game with humans. For the player detection, the previous method had limitations in that the target player was lost when his/her arms were detected as a body or when his/her arms overlapped his/her body. The proposed method first determines the leftmost and the rightmost points, even when the arms overlap with the body, and then the body is determined based on the width of the object within the observation area. Moreover, we proposed an improved player-tracking algorithm that can robustly discover and track the players who were once hidden by other players.

We conducted experiments to confirm the proposed method’s ability to detect the target persons robustly. Based on the four postures of the players, we confirmed that the proposed method hardly lost or misdetected the players regardless of the posture. We also conducted an evaluation experiment of the proposed tracking method that showed that the proposed method worked under various motion patterns.

Finally, we conducted an actual game with two human players in which the robot successfully tracked the players and conducted its role throughout the entire game.

The person detection and tracking method developed in this paper can contribute to not only the Darumasan ga Koronda game but also other applications. For example, the robust person tracking method that can re-discover the target person will contribute to the realization of a robot that can follow a person [35], which can be applied to a robotic hand cart [36] or a smart wheelchair [27]. Moreover, this technology can also be applied to a guidance robot [37, 38] that guides guests to a destination. When the robot is guiding the guest, the robot may lose the guest when the guest is hidden by other people. Using the proposed method, the guidance robot can track a guest even when the robot loses the guest. Because the LRF is not affected by lighting conditions, the proposed method is also effective for realizing a patrol robot [39,40,41] that works at night. Finally, while this work aims to realize a robot that plays Darumasan ga Koronda, the proposed technology of person detection and tracking can be used for other games, such as the Tag [42, 43].

In future works, we will extend the robot system so that the robot can act as a player (rather than “it”), changing the role of the player and “it.” The experiment of the game was conducted indoors, but the game should be played outdoors.

Additionally, we need to consider the safety of the players (since the players may be children), as well as the safety of the robot (since children sometimes tend to harm robots) [44].