Introduction

When robots, mobile or stationary, act in an environment close to humans, the importance of the human-machine or more specifically the human-robot interface increases. The primary purpose of robots and their applications is to support the humans, either in production environments, at work, at home, or for leisure. In all these areas, a development of the deployed robots has taken place, which includes the enhancement or improvement of the interfaces. Human-robot interaction has become a growing field of research [1].

However, the type of robot and the applied interfaces differ within the aforementioned sectors. The term human-robot interaction is sometimes defined as interface and other times as a description for the types of collaboration. Therefore, the first section of this review describes definitions of the term “human-robot interaction” and distinguishes between different robot types. Along with this differentiation between robot types, this section of the literature review regarding human-robot interfaces is split into these two types. Before this, an overview of human-robot collaboration is given, which mainly focuses on industrial applications and describes approaches for user interfaces in this context. The following section compares the interfaces in industrial applications and in service robotics.

Literature Review

Reflection of Human-Robot Interaction

Human-machine interfaces enable communication between human and machine. Since this review focuses on robotics, the term human-robot interface specifies the aforementioned term. Gautam et al. describe a human-machine interface as “a technique, which is used for controlling machines with human activities” [2]. Within the context of robotics, Zhang gives a definition for HMI: “A human–machine interface in a robotic system is a terminal that allows the human operator to control, monitor, and collect data, and can also be used to program the system”. The main purpose of the interface is “communicating information from the machine to the user, and communicating information from the user to the machine” [3].

ISO 8373 distinguishes between two types of robots. Industrial robots are “an automatically controlled, reprogrammable, multipurpose manipulator, programmable in three or more axes, which can be either fixed in place or mobile for use in industrial automation applications” [4]. There are different types of industrial robots according to their mechanical structure, e.g., scara, cartesian, or articulated robots. Automated guided vehicles (AGV) are not included. A service robot “performs useful tasks for humans or equipment excluding industrial automation application.” This raises the question, whether a robot used in an industrial application of a human-robot collaboration should be defined as a service robot or as an industrial robot. This discussion is moved to the Human-Robot Collaboration.

Within the report on service robots by the International Federation of Robotics (IFR), they state that human-robot interaction refers to the exchange of information and action between human and robot using a user interface. Goodrich & Schultz give the following, differing definition of human-robot interaction (HRI) as follows: “HRI is a field of study dedicated to understanding, designing, and evaluating robotic systems for use by or with humans” [5]. This definition refers to the whole robot system, not only the user interface. Adding the word “physical” to the term, the meaning changes to the kind of collaboration between human and robot, which can be supportive, collaborative, or cooperative according to Haddadin & Croft [6]. This leads to the question, whether human-robot interaction is an umbrella term for the stages of collaboration or cooperation and if the meaning as an exchange of information and actions fits into this definition. Several papers use HRI as an umbrella term, such as Tsarouchi et al. or Schmidtler et al. [7, 8]. This article focuses on the human-machine interface, which is considered to be a requirement for interaction, meaning the exchange of information.

Goodrich & Schultz describe media for human-robot interaction that includes visual displays, gestures of hands and face, speech and natural language, non-verbal audio for alerts, and physical interaction and haptics [5]. These media options are mostly used in combination in order to enable the information exchange between human and robot. Using more than one human sense for the information in- or output is called multimodal system.

Non-verbal communication, such as gestures, is deemed to be an effective channel in noisy manufacturing environments. Gestures are often observed as the popular communication mode among workers. They consist of movements of body parts and are a means of communication that convey information or intentions to an observer [9]. In order to perform the gesture recognition, two types of approaches are usually deployed: inertial-based or vision-based methods. When the human gestures were performed by a robot hand, most of the gestures were accurately recognized by the participants [10].

Speech is a common and favored communication channel in many human-human interaction scenarios. Many researchers focus on speech recognition in the context of HRI. Microphones capture raw audio data, and the system interprets it as commands to instruct robots. Speech recognition has already been used for several years, for example, for car navigation or information systems [11].

Human-Robot Collaboration

The term human-robot collaboration is mainly used for industrial applications, where human and robot share the same workspace. Several definitions exist for these kinds of applications. Kolbeinsson et al. name it as a smooth transition from isolation towards collaboration of human and robot [12]. Schmidtler et al. distinguish between coexistence, cooperation, and collaboration [8]. In terms of industrial applications, the robots used are mainly articulated robots with up to seven axes and integrated sensors, such as the lightweight robot presented by Albu-Schaeffer et al. [13]. According to the definitions described above, the purpose of industrial robots is automation and of service robots to perform useful tasks for humans. As the robot in a collaboration still performs tasks automatically, but also supports the human, robots used in an industrial human-robot collaboration could be defined as both. Deuse et al., for example, name this phenomenon “service robotics” [14]. However, in the case of coexistence, the robot automatically performs a task, and therefore the definition would be industrial robot. Therefore, this article includes human-robot collaboration within the definition of industrial robots.

The development and research regarding the human-machine interfaces in this context often focuses on interfaces for the programming of the robots. For this usage, the common user interface is a touch display. Another possibility is using haptics when guiding a robot for programming purposes. Pires et al. and Riedl et al. describe approaches in this field [15, 16].

Human-robot collaboration itself can be applied with stationary robots, but also with mobile robots. Within a collaboration of human and robot, the importance of the interface increases with the stage of collaboration and the autonomy of the robot. A close collaboration between human and robot requires interfaces not only for programming but also for the information exchange in order to confirm actions.

In contrast to the application of a human-robot collaboration in an industrial context, similar systems are deployed for the support of handicapped people. These applications are categorized as service robots according to the definition above, even though the robots used in research for these cases are often designed for an industrial use.

HRI for Industrial Robots

For the industrial use of robots, stationary or mobile, there exist many approaches for human-robot interfaces. Villani et al. identified safety and intuitiveness as two essential requirements for human-robot collaboration in industrial settings [17]. In the case of combination of several modalities, an approach is described within articles for multimodal interaction. Human-robot interfaces used in industrial applications today are mainly touch displays, which are provided with the robot. These, in fact, become more intuitive to use by the usage of apps or programming blocks. Manufacturers do not yet offer different interaction technologies, such as gestures or speech recognition. However, research has posited a number of approaches for interfaces using different technologies, such as gestures, speech, or multimodal approaches.

Gesture control of robots in an industrial setting is increasing. Liu & Wang present a review of gesture recognition for human-robot collaboration [18]. Qian et al. proposed an application of gesture-based remote human-robot interaction using a Kinect sensor for the teleoperation of a robot arm [19]. Four gestures are evaluated in the experiment: WaveLeft, WaveRight, Riseup, and PutDown. The gesture recognition method combines depth information with a traditional Camshift tracking algorithm by using Kinect and employs HMM in dynamic gesture recognition [19]. Hugle et al. present a hybrid programming method, combining robot guidance with pointing gestures [20]. The gestures are used to define approximate poses, which can be further defined by robot guidance. Furthermore, a mobile AR interface supports the worker presenting the poses and trajectories. Tsarouchi et al. present a different approach for programming an industrial robot using gestures [21]. They set up a gesture vocabulary for body and hand gestures. In order to detect the body gestures, they use an RGB-d camera and for the hand gestures, a leap motion. Both types of gestures are used to move the robot arm in different directions. Tang & Webb describe a system to control industrial robots using gestures [22]. The focus of their paper is the user-centered development process based on RULA. Their conclusion was that the gesture control should be used in conjunction with a graphical user interface or augmented or virtual reality. The approach by Simao et al. uses a magnetic tracker and a data glove in order to obtain the data to feed neural networks for the gesture recognition. The gestures are used to control the robot by a virtual joystick [23]. Sadik et al. use hand gestures to interact with an industrial robot in a cooperative flexible manufacturing scenario [24].

Summarized, gesture recognition for industrial robots is primarily used to maneuver robots in to specific poses or to change their direction, the process which shall be integrated into the robot program.

Voice control is primarily used as an interface for robot control. Maksymova et al. presented various models for the voice control of an industrial robot in the context of an assembly task [25]. Voice control is also often combined with other modality such as gestures or eye gaze to improve the accuracy of human command recognition.

Maurtua et al. present a semantic approach for multimodal interaction between humans and industrial robots to enhance the dependability and naturalness of the collaboration between them in real industrial settings [26]. The approach is based on the recognition of verbal commands and gestures which communicate requests for processing. Two tasks from a casting process are evaluated: a dis-assembly task (involving screwing and unscrewing operations) and the deburring of wax pieces as the second task. Ivaldi et al. reported on the influence of extroversion and attitude towards robots on the temporal dynamics of social signals (i.e., gaze towards the robots’ face and speech), during a human-robot interaction task, where a human must physically cooperate with a robot to assemble an object [27]. Some research focuses on the development of the software architecture. D’Haro et al. described a modular platform for operating and controlling industrial robots by using speech and a task-oriented graphical interface [28]. The proposed architecture consists of independent modules that communicate among them by means of the ROS operative system, which also allows the integration of external components. In addition, a control interface and a defined set of services and topics allow developers to configure, replace, or extend the provided modules, as well as re-train and update machine translation models and natural language understanding rules.

A multimodal interface for mobile robots approach is presented by Berg et al., whereby hand and eye gestures provide the user input. In order to define the gestures, within the project, a survey on feasible gestures was conducted. Furthermore, Berg et al. use a projector to display information about the robot system on the floor [29].

HRI in Service Robotics

The field of service robotics has gained significant attention since the end of the last century, a lot of research work has been published in various branches, e.g., entertainment, healthcare, nursing, household, etc. Because mostly non-experts use the service robots, an intuitive and effective concept for human-robot interface is necessary. Similar to the analysis for industrial robots, different human-robot interface strategies are described according to the modality, starting with gesture recognition. A range of other approaches exist, for example, guidance of an electric wheelchair using head movements [2]. Jones & Schmidlin conducted a thorough review of HRI for personal service robots to facilitate the design of usable personal service robots [30].

Gesture-based interfaces are also very important for service robots. Some researchers focus on improving recognition accuracy of gestures. Vision-based gesture interface has been an active research area since the end of the last century. Thanks to the development of sensor technology and the graphic computing unit, recognition accuracy has improved rapidly, and the system is more robust even within complex environments [31]. Oyedotun & Khashman describe a vision-gesture interface for the control of a mobile robot equipped with a manipulator [31]. The interface uses a CCD camera to track a person and recognize gestures involving arm motion. A fast, adaptive tracking algorithm enables the robot to track and follow a person reliably through office environments with changing lighting conditions. The system is trained to recognize four different gestures: stop, follow, pointing vertical, and pointing low. Results are reported in the context of an interactive clean-up task, when a human guides the robot to specific locations that need to be cleaned and instructs the robot to pick up trash [32]. Sousa et al. present an approach to recognize hand gestures including start, stop, and pause, based on RGB and depth images [33]. Within the computer vision community, there are quite a few datasets available for evaluating the developed methods in the research field of HRI. Cho & Jeong constructed a gesture database with an autonomous mobile robot as a service robot [34•]. The gestures are recorded with various backgrounds, distances, and small variations of the poses. The Kinect sensor (Ver. 2) was used to acquire the data. In order to evaluate the performance of a commercial service robot, an experiment was conducted using a hand-greeting gesture for guiding visitors in a lobby. The best performance was obtained at 2.5 m. Canal et al. introduced a gesture recognition method also based on Kinect sensor (Ver. 2); three gestures are implemented in the experiment: point at, wave, nod, and the head negation [35]. A method for estimating the location of pointing gesture is also introduced, in which a NAO Robot has to approach a specific location and analyze which objects are present, trying to deduce which object the user was referring to. Besides vision-based gesture interface, inertial sensors are also widely investigated in Stančić et al. or Jiang et al. [36, 37].

The speech recognition area is mainly investigated from two perspectives: developing artificial language to be used by humans or leveraging natural language based dialog systems. Mubin et al. designed an artificial language that humans can use to communicate with robots [38]. The so-called Robot Interaction Language (ROILA) is proven to be not significantly worse than English by an empirical study. Natural language interface is an intuitive and flexible mechanism for humans to instruct and response to robots. Automated speech recognition (ASR) is an import unit for verbal information exchange. Thanks to deep learning technology, ASR engines have been significantly improved in recent years. Some widely used engines are Google API, Microsoft API, CMU PocketSphinx, etc. The performance of ASR engines is evaluated for child speech recognition during interaction with a NAO robot [39•]. In the experiment, the following data was collected: numbers from one to ten, multi-word utterances based on spatial relationships between two nouns, for example, “the horse is in the stable,” and spontaneous speech from a picture book like “Frog, Where Are You?”. The results show that ASR engines do not work reliably with children and should be improved. A review of service robots and systems that can cope with uncertain information in natural language instructions was presented in Muthugala & Jayasekara [40]. Various scenarios like “Move a little forward” are evaluated.

Robotic systems are becoming increasingly complex. This requires new tools to manage this complexity and to interact between different system components. Some researchers try to address this problem by developing convenient programming concepts through graphical user interfaces [41, 42].

The combination of multiple modalities can improve the performance and robustness of a system. A typical multimodal HRI system consists of a graphical user interface, speech input and output, and as a camera. Emerging new modalities like tactical skins [43] and brain interfaces [44] are also considered in some systems. The design process of HRI application normally starts from an application scenario, such as a service robot in a kitchen or assistive robot for surgery. Böhme introduced a multimodal scheme for human-robot interaction, which allows the service robot to work in an un-engineered, cluttered, and crowded environment [45]. In the experiment, a mobile robot is operated in a homeware store as an intelligent interactive shopping assistant. An omnidirectional color camera, attached to the robot, with a 360 degree panoramic view is used for user localization and tracking, self-localization, and local navigation. A binocular six DoF active-vision head with two frontally aligned color cameras is used for user localization and tracking, odometry correction and obstacle avoidance, a binaural auditory system for acoustic user localization and tracking, and a touch screen for immediate human-robot interaction. They demonstrated that the combined utilization of speech, vision channel, and a graphical user interface running on a touch screen is a promising way for building interfaces for mobile service robots. Foukarakis et al. discussed the use of a multimodal user interface development framework for developing elderly friendly robotic applications [46]. The framework provides developers with the necessary technologies, tools, and building blocks for creating easy to use elderly friendly multimodal applications on robotic platforms; it supports speech input and output, gesture input, and touch-based graphical interaction. Cifuentes et al. presented an interaction strategy for human-walker cooperation [47]. The presented strategy is based on the acquisition of human gait parameters by means of data fusion from inertial measurement units and a laser range finder. Recently, the data-driven approach has been widely used for the interpretation of human behavior in the context of HRI. Trick et al. demonstrated an approach for multimodal intention recognition to be applicable in elderly assistance [48]. A collaborative task is evaluated where a 7-DoF robot arm supports the human preparing some food in a kitchen, while human intentions, such as the handover of a board, a tomato, etc. and the intention to stand up, are recognized online using speech, gestures, gaze directions, and scene objects. For capturing gestures and scene objects, the human’s hand and the scene objects are equipped with markers. Speech is captured with a microphone and gaze with a head-mounted eye-tracker. The results of this work show that uncertainty can be decreased through the use of multiple modalities. Newman et al. presented a large multimodal dataset of human interactions in an assistive eating task [49]. During this task, a variety of types of participant data were collected, including eye gaze information, electromyography of the controlling arm, stereo video, and robot controller information. This dataset can enable research into human-robot collaboration and multimodal human behavior analysis. Celiktutan et al. introduced a multimodal human-human-robot-interaction dataset, with the aim of studying personality simultaneously in human-human interactions (HHI) and HRI and its relationship with engagement [50].

Ongoing Research

The project Sherlock (est. October 2018) introduces the latest safe robotic technologies including high payload collaborative arms, exoskeletons, and mobile manipulators in diverse production environments. The goal is to enhance them with smart mechatronics and AI-based cognition, creating efficient HRC stations that are designed to be safe and guarantee the acceptance and well-being of the operators.

Within the project HR-RECYCLER, a “hybrid human-robot recycling plant for electrical and electronic equipment” is being developed. Novel techniques that are adaptive and personalized to each human worker will be developed.

The objective of the project KoBo34 is to develop a humanoid two-armed robot for intuitive haptic interaction with humans in order to assist elderly people.

Differences in Industrial and Service Robotics: a Personal View

The review of the articles shows that there is a difference in human-machine interfaces between service and industrial robots and that there is more research on interfaces for service robotics. However, within the industrial sector, the interfaces gain precedence. The work of the authors includes research on interaction technologies for industrial stationary and mobile robotics [29, 51]. Media already used in service robotics and other areas, such as speech or gestures, are used for the communication between humans and industrial robots. Because a production environment can be different, a multimodal approach is designed, where the human has different possibilities to communicate the same intention. One reason could be that the usage of cameras to capture images or microphones in an industrial setting is not yet clear. With the increasing degree of mobility or autonomy of the mobile industrial robots, the interface becomes more important, because in an environment with humans, an intuitive communication method without expert knowledge is necessary. However, the robot’s task in industrial applications is complex, so that speech or gestures alone as inputs are often insufficient. Thus, there is a need for the combination of different modalities in order to build up the user interface. The authors experience that designing interfaces for industrial applications is challenging, because there are significant differences between the application scenarios. Furthermore, it seems like there is not enough user integration into the design process of the user interfaces for the industrial robots. This process of more intuitive user interfaces for industrial robots has started, which can be seen on, e.g., Universal Robot or Franka Emika.

In the service robotics area, in order to support target groups, e.g., physically handicapped people, research approaches use robots that were originally designed for industrial purposes. These robot types are enhanced by user interfaces. Robots originally designed for the interaction with people already have user interfaces. These communicative robots, however, due to a missing manipulator, do not have the ability to carry items. Current systems carrying a manipulator often still lack intuitive user interfaces. Therefore, a merging of both areas, service robots, and industrial robots seems constructive and is being implemented as seen in the approaches by Asfour et al. [52].

Future Research

Many questions remain to be answered within the field of HRI. The currently available approaches for industrial robots remain in their infancy. There is a need for the combination of existing approaches of interfaces, like gestures, speech, or even touch displays. Especially for the use of industrial mobile robots, there is a lack of interaction possibilities. Even though the approaches work in specific cases, there is little transferability between the applications. In order to implement these robots in the industry, comprehensive analysis should be carried out in order to facilitate implementation. The robot manufacturers, especially of those which interact with humans, need to consider a rethink of the robot’s design and communication methods. The robots need interaction technologies, such as cameras, microphones, or improved touch displays to enable information exchange by speech, gestures, or on a display. Before equipping the robot with these technologies, the information, which shall be exchanged, must be defined. For industrial applications, research must identify and evaluate which actions can be started or controlled by interaction and how the actions can be connected with the specific interaction mode. The challenge is to evaluate the high amount of different tasks that appear in industrial use cases towards their capability for easy communication. It is expected that the enhancement of the interaction technologies will improve the human-robot collaboration, because the humans can communicate easier with the robot, e.g., to assign tasks to it or support the robot in a complex situation. Furthermore, it leads to more acceptance towards the robot.

Several service robots on the market, which are responsible for interacting with people, such as Pepper, already possess interaction technologies. Future research in this field often focuses on robots in healthcare environments, which are in direct contact with humans, for example, for feeding purposes. Current approaches need to be further developed, e.g., by enhancing the interaction capabilities. Also in this area, evaluations need to be conducted in order to improve the systems, up to a standard that will facilitate implementation. The research on brain-robot interaction provides promising possibilities. Here, of course, there is still space for further research.

Another aspect within this discussion is data privacy. In order to use speech or gesture recognition, data must be acquired. Ensuring privacy and security of this data is an important topic on the way to bringing these interaction technologies into usage. Figueiredo et al., for example, present an approach for ensuring the security and privacy when using cameras, such as the Microsoft Kinect [53].

Conclusions

The review gives an overview of current research concerning human-robot interfaces for different sectors, such as industrial applications or the use as service robots. The discussion whether robots used in a human-robot interaction in an industrial use case results in the statement that they could be defined as industrial or service robots. The authors count them as industrial robots. Summarizing the topic on human-robot interfaces, a greater number of approaches exist regarding human-robot interaction within the area of service robotics than there exists for industrial robotics. However, also for industrial robotics where human and robot work in the same area, the interaction technologies must be improved, because the humans need to be able to communicate with the robot in an easy or even natural manner. For this, the robot manufacturers need to integrate more interaction technologies in their robots. In order to convince manufacturers to integrate these technologies, more evaluation is required, which presents the additional value of human-robot interaction for the work between human and robot. After this proof, the robot manufacturers can adopt research approaches for human-robot interaction, such as speech, gestures, or improved touch displays.