1 Introduction

Tech United EindhovenFootnote 1 (established 2005) is the RoboCup student team of Eindhoven University of TechnologyFootnote 2 (TU/e), which joined the ambitious @Home League in 2011. The RoboCup @Home competition aims to develop service robots that can perform everyday tasks in dynamic and cluttered ‘home’ environments. Multiple world vice-champion titles have been obtained in the Open Platform League (OPL) of the RoboCup @Home competition during previous years, and this year, whilst competing in the Domestic Standard Platform League (DSPL) for the first time, the world championship title was finally claimed. In the DSPL, all teams compete with the same hardware; all teams compete with a Human Support Robot (HSR), and use the same external devices. Therefore, all differences between the teams regard only the software used and implemented by the teams.

Tech United Eindhoven consists of (former) PhD and MSc. students and staff members from different departments within the TU/e. This year, these team members successfully migrated the software from our TU/e built robots, AMIGO and SERGIO, to HERO, our Toyota HSR. This software base is developed to be robot independent, which means that the years of development on AMIGO and SERGIO are currently being used by HERO. Thus, a large part of the developments discussed in this paper have been optimized for years, whilst the DSPL competition has only existed since 2017Footnote 3. All the software discussed in this paper is available open-source at GitHubFootnote 4, as well as various tutorials to assist with implementation. The main developments that resulted in the large lead at RoboCup 2019, and eventually the championship, are our central world model, discussed in Sect. 2, the generalized people recognition, discussed in Sect. 4 and the head display, discussed in Sect. 5.3.

2 Environment Descriptor (ED)

The TU/e Environment Descriptor (ED) is a Robot Operating System (ROS) based 3D geometric, object-based world representation system for robots. ED is a database system that structures multi-modal sensor information and represents this such that it can be utilized for robot localisation, navigation, manipulation and interaction. Figure 1 shows a schematic overview of ED.

ED has been used on our robots in the OPL since 2012 and was also used this year in the DSPL. Previous developments have focused on making ED platform independent, as a result ED has been used on the PR2, Turtlebot, Dr. Robot systems (X80), as well as on multiple other @Home robots.

Fig. 1.
figure 1

Schematic overview of TU/e Environment Descriptor. Double sided arrows indicate that the information is shared both ways, one sided arrows indicate that the information is only shared in one direction.

ED is a single re-usable environment description that can be used for a multitude of desired functionalities such as object detection, navigation and human machine interaction. Improvements in ED reflect in the performances of the separate robot skills, as these skills are closely integrated in ED. This single world model allows for all data to be current and accurate without requiring updating and synchronization of multiple world models. Currently, different ED plug-ins exist that enable robots to localize themselves, update positions of known objects based on recent sensor data, segment and store newly encountered objects and visualize all this in RViz and through a web-based GUI, as illustrated in Fig. 9. ED allows for all the different subsystems that are required to perform challenges to work together robustly. These various subsystems are shown in Fig. 2, and are individually elaborated upon in this paper.

Fig. 2.
figure 2

A view of the data interaction with robot skills that ED is responsible for.

2.1 Localization, Navigation and Exploration

The ed_localizationFootnote 5 plugin implements AMCL based on a 2D render of the central world model.With use of the ed_navigation pluginFootnote 6, an occupancy grid is derived from the world model and published. With the use of the cb_base_navigation packageFootnote 7 the robots are able to deal with end goal constraints. The ed_navigation plugin allows to construct such a constraint w.r.t. a world model entity in ED. This enables the robot to navigate not only to areas or entities in the scene, but to waypoints as well. Figure 3 also shows the navigation to an area. Modified versions of the local and global ROS planners available within move_base are used.

Fig. 3.
figure 3

A view of the world model created with ED. The figure shows the occupancy grid as well as classified objects recognized on top of the cabinet.

2.2 Detection and Segmentation

ED enables integrating sensors through the use of the plugins present in the ed_sensor_integration package. Two different plugins exist:

  1. 1.

    laser_plugin: Enables tracking of 2D laser clusters. This plugin can be used to track dynamic obstacles such as humans.

  2. 2.

    kinect_plugin: Enables world model updates with use of data from a RGBD camera. This plugin exposes several ROS services that realize different functionalities:

    1. (a)

      Segment: A service that segments sensor data that is not associated with other world model entities. Segmentation areas can be specified per entity in the scene. This allows to segment object ‘on-top-of’ or ‘in’ a cabinet. All points outside the segmented area are ignore for segmentation.

    2. (b)

      FitModel: A service that fits the specified model in the sensor data of a RGBD camera. This allows updating semi-static obstacles such as tables and chairs.

The ed_sensor_integration plugins enable updating and creating entities. However, new entities are classified as unknown entities. Classification is done in ed_perception pluginFootnote 8 package.

2.3 Object Grasping, Moving and Placing

The system architecture developed for object manipulation is focused on grasping. In the implementation, its input is a specific target entity in ED, selected by a Python executive and the output is the grasp motion joint trajectory. Figure 4 shows the grasping pipeline.

Fig. 4.
figure 4

Custom grasping pipeline base on ED, MoveIt and a separate grasp point determination and approach vector node.

MoveIt! is used to produce joint trajectories over time, given the current configuration, robot model, ED world model (for collision avoidance) and the final configuration.

The grasp pose determination uses the information about the position and shape of the object in ED to determine the best grasping pose. The grasping pose is a vector relative to the robot. An example of the determined grasping pose is shown in Fig. 5. Placing an object is approached in a similar manner to grasping, except for that when placing an object, ED is queried to find an empty placement pose.

Fig. 5.
figure 5

Grasping pose determination result for a cylindric object with TU/e built robot AMIGO. It is unpreferred to grasp the object from behind.

3 Image Recognition

The image_recognition packages apply state of the art image classification techniques based on Convolution Neural Networks (CNN).

  1. 1.

    Object recognition: Tensorflow™ with retrained top-layer of a Inception V3 neural network, as illustrated in Fig. 6.

  2. 2.

    Face recognition: OpenFaceFootnote 9, based on Torch.

  3. 3.

    Pose detection: OpenPoseFootnote 10.

Fig. 6.
figure 6

Illustration of Convolution Neural Networks (CNN) used in our object recognition nodes with use of Tensorflow.

Our image recognition ROS packages are available on GitHubFootnote 11 and as Debian packages: ros-kinetic-image-recognition

4 People Recognition

As our robots need to operate and interact with people in a dynamic environment, our robots’ people detection skills have been upgraded. This skill is upgraded to a generalized system capable of recognizing people in 3D. In the people recognition stack, an RGB-D camera is used as the sensor to capture the scene information. A recognition sequence is completed in four steps. First, people are detected in the scene using OpenPose and if their faces are recognized as one of the learned faces in the robots’database, they are labeled using their known name using OpenFace. The detections from OpenPose are associated with the recognitions from OpenFace by maximizing the IoUs of the face ROIs. Then, for each of the recognized people, additional properties such as age, gender and the shirt color are identified. Furthermore, the pose keypoints of these recognitions are coupled with the depth information of the scene to re-project the recognized people to 3D as skeletons. Finally, information about the posture of each 3D skeleton is calculated using geometrical heuristics. This allows for the addition of properties such as “pointing pose” and additional flags such as ‘is_waving’, ‘is_sitting’, etc.

4.1 Pointing Detection

This year’s tournament challenges involved various non-verbal user interactions such as detecting to what object the user was pointing. In the previous section, our approach to people recognition is explained. This recognition includes information about the posture of each 3D skeleton. Once the people information is inserted into the world model, additional properties can be added to the persons that take also other entities in the world model into account, e.g. “is_ pointing_ at_ entity”. This information is used by the toplevel state machines to implement challenges such as ‘Hand Me That’, the description of which can be found in the 2019 RulebookFootnote 12. However an additional check is inserted to ensure that the correct operator is found. This check is based on a spatial queries. By using such a query it is possible to filter out people based on their location. Finally, to determine at which entity the operator is pointing, ray-tracing is implemented. Figure 7 shows an example of the ray-tracing.

Fig. 7.
figure 7

Ray-tracing based on pose detection with AMIGO.

5 Human-Robot Interface

We provide multiple ways of interacting with the robot in an intuitive manner: WebGUI, Subsect. 5.1, and Telegram™ interface, Subsect. 5.2, which uses our conversation_engine, Subsect. 5.2.

5.1 Web GUI

In order to interact with the robot, apart from speech, we have designed a web-based Graphical User Interface (GUI). This interface uses HTML5Footnote 13 with the Robot API written in Javascript and we host it on the robot itself.

Fig. 8.
figure 8

Overview of the WebGUI architecture. A webserver that is hosting the GUI connects this Robot API to a graphical interface that is offered to multiple clients on different platforms.

Fig. 9.
figure 9

Illustration of the 3D scene of the WebGUI with AMIGO. User can long-press objects to open a menu from which actions on the object can be triggered

Figure 8 gives an overview of the connections between these components and Fig. 9 represents an instance of the various interactions that are possible with the Robot API.

5.2 Telegram™

The Telegram interfaceFootnote 14 to our robots is a ROS wrapper around the python-telegram-bot library. The software exposes four topics, for images and text resp. from and to the robot. The interface allows only one master of the robot at a time. The interface itself does not contain any reasoning. This is all done by the conversation_engine, which is described in the following subsection.

Conversation Engine

The conversation_ engineFootnote 15 bridges the gap between text input and an action planner (called action_server). Text can be received from either Speech-to-Text or from a chat interface, like Telegram™. The text is parsed according to a (Feature) Context Free Grammar, resulting in an action description in the form of a nested mapping. In the action description, (sub)actions and their parameters are filled in. This may include references such as “it”.

Based on the action description, the action_server tries to devise a sequence of actions and parameterize those with concrete object IDs. To fill in missing information, the conversation_engine engages with the user. When the user supplies more information, the additional input is parsed in the context of what info is missing. Lastly, it keeps the user “informed” whilst actions are being performed by reporting on the current subtask.

Custom Keyboard, Telegram HMI

The user interface modality as explained above has been extended to reduce the room for operator error by only presenting the user with a limited number of buttons in the Telegram app. This has been realized through Telegrams custom_ keyboardsFootnote 16 feature. This feature is especially useful if there are only a few options, such as when selecting from a predetermined selection of drinks, as has been shown in our finals during RoboCup 2019.

Since the competition, this feature has been employed to compose commands word-for-word. After the user has already entered, via text or previous buttons, for example “Bring me the ...” the user is presented with only those words that might follow that text according to the grammar, eg. “apple”, “orange” etc. This process iterates until a full command has been composed. This feature is called hmi_ telegramFootnote 17.

5.3 Head Display

For most people, especially people who do not deal with robots in their day-to-day life, interaction with robots is not as easy as one would like it to be. It is often difficult to hear what the robot is saying and it is not always intuitive for people to know when to talk to the robot. To remedy this, the head display of HERO is used. On this display that is integrated in the Toyota HSRs’ ‘head’, a lot of useful information can be displayed. Through the hero_ displayFootnote 18 a few different functionalities are integrated. As per default, our Tech United @Home logo with a dynamic background is shown on the screen, as depicted in Fig. 10. When the robot is speaking the spoken text is displayed, when the robot is listening a spinner along with an image of a microphone is shown and it is possible to display images.

Fig. 10.
figure 10

The default status of HERO’s head display.

6 Re-usability of the System for Other Research Groups

Tech United takes great pride in creating and maintaining open-source software and hardware to accelerate innovation. Tech United initiated the Robotic Open Platform websiteFootnote 19, to share hardware designs. All our software is available on GitHubFootnote 20. All packages include documentation and tutorials. Tech United and its scientific staff have the capacity to co-develop (15+ people), maintain and assist in resolving questions.

7 Community Outreach and Media

Tech united has organised 3 tournaments: Dutch Open 2012, RoboCup 2013 and the European Open 2016. Our team member Loy van Beek has been a member of the Technical Committee during the period: 2014–2017. We also carry out many promotional activities for children to promote technology and innovation. Tech United often visits primary and secondary schools, public events, trade fairs and has regular TV appearances. Each year, around 50 demos are given and 25k people are reached through live interaction. Tech United also has a very active websiteFootnote 21, and interacts on many social media like: FacebookFootnote 22, InstagramFootnote 23, YouTubeFootnote 24, TwitterFootnote 25 and FlickrFootnote 26. Our robotics videos are often shared on the IEEE video Friday website.