1 Introduction

As people age, they need more support due to their declining capabilities and age-related illnesses. Today that support is provided by younger caregivers. However, the ratio of younger caregivers to elderly people is decreasing in all developed countries. Worldwide, in 2010 there were fewer than nine persons of working-age per elderly person of 65 or older. By 2050 this ratio is expected to decrease to fewer than four working-age persons per elderly person [10]. This will lead to an enormous societal problem due to an increasing demand for care and a shortage of caregivers [52].

Smart homes are one solution to this problem. Smart homes include a communications network that allows the various systems and devices in the home to communicate with each other. The modern home contains a variety of systems, such as central heating, fire and security alarms, and devices, such as televisions and lights, that usually exist in total isolation from each other. In a smart home, these systems and devices are able to pass information and commands between them so that, for example, the security alarm can turn the lights on or off. Smart homes have also been equipped with systems that can assist elderly persons in everyday tasks such as communication with the external world and medicine and health check reminders in a proactive fashion [13]. A number of smart home projects and research studies have been undertaken in Europe and worldwide [4, 36, 37, 63].

Socially assistive robots (SARs) are another solution to this problem. Several EU-projects have addressed modeling, defining, and implementing social and cognitive skills in SARs and on human robot communication [14, 16, 23, 24, 33, 40, 51]. SARs can provide reminders and instruction such as the nursing robot Pearl [47] and the Korean robotic language teacher EngKey [31]. They can also provide social support. Social support typically aims at reducing social isolation and enhancing well-being in the form of social interaction with users [20]. The robot’s physical embodiment and multimodal communication channels allow it to communicate with users (verbally and non-verbally) in a social manner so that users benefit from the interaction. Research suggests that people tend to attribute human-like characteristics to robots [9, 21, 31] and regard them more as companions than tools [6, 61]. As such they can reduce the feeling of social exclusion and the level of stress, both common problems of aging [8, 11].

Both of these solutions to the demographic problem have their drawbacks. A smart home requires the elderly to go to different parts of the house to access its services. On the other hand, the presence of a robot is insufficient by itself because it needs to interact with the home environment to obtain relevant information that only external devices can provide (e.g., pollution levels, environmental conditions) and to control various systems and devices in the home.

The present paper describes the rationale and results of the KSERA (Knowledgeable SErvice Robots for Aging) project [29]. The KSERA project integrated smart home technology and a socially-assistive robot to extend independent living for elderly people. KSERA helps elderly people with daily activities and care needs. The robot is the most visible component of the system playing the role of communication interface between the elderly, the smart home, and the external world. The robot’s behavior is determined in part by sensor information gathered through the smart home, and in part through interaction with its user. To ensure user acceptance, the KSERA system was developed based on user needs, treatment plans, and laboratory studies and validated through user studies and field trials. This resulted in a number of key technologies that are required for successful adoption of such systems. In particular, we developed and implemented person- and self-localization abilities, person-aware navigation, speech recognition and generation, robot gestures, emulated emotions, eye contact and joint attention, and audio-video communication with family members and care givers. In Sect. 2 of this paper we describe the approach we used to design KSERA; in Sect. 3, we give a technical description of the KSERA system; and in Sect. 4, we discuss conclusions and future applications of such systems.

This paper consolidates our research results obtained during the course of our project, some of which have been published in other journals and conferences, into a holistic view of the KSERA project. We believe that the holistic view of KSERA yields a better understanding of requirements for SARs when applied to real world scenarios in healthcare. In addition, it will provide a quick introduction and overview of the latest developments in key aspects of social robotics and also highlights the scientific contributions and technical improvements we have made, and some of the difficulties that we overcame.

2 The KSERA Approach

In this section we present the approach we used to design the KSERA system. In particular, we will describe the user-centered approach and user needs (Sect. 2.1), metrics we used to measure the HRI quality, usability, user acceptance, and improvement in quality of life (Sect. 2.2), how we developed ethical guidelines (Sect. 2.3), and finally how we validated the approach with user studies (Sect. 2.4), field trials (Sect. 2.5), and user experience studies (Sect. 2.6).

2.1 User-Centered Approach and User Needs

During the development of the KSERA system we adopted the User-Centered Design (UCD) framework to link the design with the needs and context of life of the people to which it is addressed [42, 46]. The UCD approach is also described by international standards [25, 26]. This framework is explicitly intended as a dynamic process, in which the end users are involved from the very beginning of the project, not as subjects, but as active agents, influencing decisions, development, and implementation. Our aim was to address the user experience with the KSERA system as a whole. Therefore, this approach is particularly relevant as it moves the focus from the simple behavioral responses of the users interacting with system components to more complex interactions among the users, the system, and the physical and social environment.

Our UCD approach starts with gathering the user needs that the integrated system should address. We identified three user groups, each with its own needs: primary users (i.e., elderly), secondary users (i.e., friends, family members, relatives, formal and informal caregivers, medical professionals, and call center personnel), and tertiary users (i.e., installation and maintenance technicians, service providers, insurance companies, municipalities, architects, social agencies, and guarantors of privacy, safety, and ethical procedures). While the needs of the primary users certainly are key, the needs of the secondary and tertiary users are considered as well to ensure high acceptance of KSERA.

During the initial KSERA research phase, we conducted three focus groups with primary users (n=17, age 63–89) in Schwechat (Austria) who were recruited from local clubs for older people. The focus groups were divided into free talks and discussion. The purpose of the meetings was to identify general needs in daily life, disease management, social inclusion, and support by technology. Before introducing the KSERA approach, the researchers interviewed the participants in the focus groups in depth concerning their needs, their disease management, and medical problems. Then, the KSERA approach was introduced and the researchers interviewed the participants on how they thought the KSERA system might be helpful in addressing their needs. From these focus groups, we identified the following user needs (UN) of the primary users:

  1. UN1

    Users with walking difficulties need a system which has a mobile interface that can approach them.

  2. UN2

    Users need a solution when they feel isolated and lonely or miss social contacts. Users need to see and speak to their families and to talk with medical caregivers (this may be via a remote connection).

  3. UN3

    Users need to be reminded to take the right medication at the right time.

  4. UN4

    Users need motivation to do physical exercises and need positive feedback about the accomplishment of personal goals.

  5. UN5

    Users need to know whether the outside environmental conditions are safe to venture out into (e.g., COPD related breathing problems).

  6. UN6

    Users need help when their vital signs are outside the normal range (e.g., an emergency call to a medical call center).

  7. UN7

    Users need a system that can be installed and integrated easily into existing homes. (This is also a tertiary user need).

Secondary and tertiary user needs were gathered from interviews with medical and care experts. The medical experts were geriatric doctors and pulmonologists from hospitals in Austria and Israel, members of the Austrian lung union, members of the European Federation of Allergy and Airway Diseases Patients Association (EFA), and the Austrian EFA spin-off. The doctors from Israel included ones treating COPD patients on a daily basis, who know their profile, medical condition, and needs. We also interviewed care experts from senior citizen centers in Austria and Israel. We used a structured interview scheme with ample time for free discussion and input from the experts. Topics that were discussed included: (1) the problems and needs of COPD patients in daily life and how technical solutions could affect their daily life in a positive way, (2) medical and vital parameter assessment and disease treatment tactics, (3) things that could harm the patients more, than they would help, (4) the KSERA-system possibilities and a detailed description of them, (5) the needs of the different characteristics of COPD patients concerning the KSERA system, (6) an order of importance; for instance what is very important concerning medical status and daily life for relatives and caregivers, and (7) the direct patient involvement in the development of the KSERA system and their participation in laboratory, field, and real life tests. From these interviews, we identified the following secondary user needs:

  1. UN8

    Caregivers need to be informed in case physical exercises were not completed or medical measurements were not undertaken or are out of range.

  2. UN9

    Secondary users need an audio/video connection for communication with primary users.

And finally, we identified the following tertiary user needs:

  1. UN10

    Tertiary users need a system that can be installed and integrated easily into existing homes, just as the primary users do (UN7).

  2. UN11

    Tertiary users need systems that are affordable and lower the high health care costs for this population while not decreasing the quality of service.

2.2 Metrics

Currently, there is no validated way to measure the HRI quality, usability, user acceptance, and improvement in quality of life (QoL) of the KSERA system. However, some of the evaluation and validation metrics of Ambient Assisted Living (AAL) and UCD apply to HRI, even though they usually do not involve the interaction between a person and a robot. The KSERA system was evaluated with the users in different ways, according to the level of maturity of the system (i.e., system design vs. prototype). Both quantitative and qualitative metrics were used to evaluate KSERA as it progressed. Quantitative metrics include Likert-scale based questionnaires, Godspeed questionnaires [7], KSERA ad hoc questionnaires [43], WHO QoL, PANAS, and system performance metrics. Qualitative metrics include: focus group discussions, free interviews, think aloud protocols, etc. This mix of metrics and metric domains offers the opportunity for a holistic evaluation of the KSERA approach, in particular, and other SAR approaches in general.

2.3 Ethical Aspects

Ethical guidelines are extremely important in research involving AAL and SARs because it deals with vulnerable people [4850]. Within the KSERA project a dedicated ethical manager was appointed to supervise ethical, privacy, safety, and trust related issues. Ethical guidelines were developed throughout the project with the view of addressing issues specifically relevant to user tests. The goal of these ethical guidelines is to ensure a good research practice and an acceptable KSERA product. The ethical guidelines include: identification of the key ethical and legal issues that need to be addressed throughout the project (e.g., privacy protection for ceiling-mounted camera and microphone), identification of national guidance documents, legislation and governance bodies in ethics that are relevant for the national KSERA partners in implementing their work, laying down the principles and practices for appropriate user participation, including recommendations for recruitment, methods, and timing of user involvement, providing hands-on guidance for creating a process for informed consent and drafting of informed consent documents at the project’s sites, and identification of other relevant guidance documents (e.g., professional codes of conduct, documents on principles of research ethics) that will support the work of the KSERA project.

The ethical manager is complemented by a team of three external experts forming an ethical advisory board (EAB) that approves the guidelines developed in the KSERA project and implementation of guidelines within the project. The EAB participates in major meetings where they gain hands-on experience with the system before actual user tests are performed. Clearance on the envisaged tests and, if necessary, changes in the test-plans according to recommendations of the EAB is done prior to the actual start of tests involving real users.

2.4 User Studies

We conducted a number of user studies in an iterative design process, which is one of the strengths of the UCD approach. These studies were carried out in Italy and the Netherlands to ensure KSERA was useful and acceptable to the primary users. Below are brief descriptions of what was done during each study and what we learned from each study.

2.4.1 Waving Is the Best Way to Attract a User’s Attention

At times, the robot may need to give critical information to a patient regarding his or her health. To do this reliably, the robot needs to gain the patient’s attention. In a laboratory study [56], we compared different actions to attract the attention of participants watching the news on TV. By measuring reaction time, we discovered that the robot was able to gain the participant’s attention faster by waving than by speaking, blinking, or making eye contact. By asking the participants to evaluate the four actions using a 5-point Likert scale for the items: clarity, presence, and friendliness, we also found that the participants preferred waving as an action for gaining their attention over speaking, blinking, or making eye contact.

2.4.2 Gestures and Gazing Make a Robot More Persuasive

The robot might need to persuade the patient to do something, for example, to take their medicine, to exercise, or not to go outside because the air quality is bad. We investigated whether a robot that uses gaze and gestures is more persuasive than a robot that uses only gaze or only gestures [22]. We investigated the combined and individual contribution of gestures and gazing on the persuasiveness of a robot extolling the aversive consequences of lying. We found that gazing by itself made the robot more persuasive; and using gestures with gazing made the robot even more persuasive. Without gazing, gestures made the robot less persuasive.

2.4.3 Physical Exercise with a Robot Improves a User’s Attitude Toward It

People with COPD and older people in general need to exercise regularly. We assessed whether people like doing exercises with a robot by asking them to fill out a Godspeed questionnaire [7] before and after doing a health exercise with a robot [56]. The results suggest that performing physical exercise with a robot improves the attitude of people toward it in terms of Likeability. There were no negative effects on the other dimensions of the Godspeed questionnaire.

2.4.4 Attitude Toward a Robot Depends on Context not Walking Behavior

From a user perspective, the robot is the most visible component of KSERA. Thus, if the robot is perceived as trustworthy and likeable, the KSERA system will be perceived as trustworthy and likeable, and users will be more likely to accept it. Presumably, a robot that moves and interacts intelligently is more trustworthy and likeable. In three scenarios of differing urgency, we tested whether an anticipatory walking behavior of a robot is appreciated as more intelligent and whether this results in a more positive attitude toward the robot [15]. We found that the context of the interaction, as manifested in the urgency of the scenario, had more effect on the perceived intelligence and attitude toward the robot, than the anticipatory walking behavior of the robot itself.

2.4.5 Communication Through a Robot is Preferred over a Smart Home

To quantify the added value of a robot as a communication channel over a smart home, we measured how quickly people could categorize messages delivered by a robot in comparison to a loud speaker simulating a smart home [56]. The difference in reaction times to the same stimulus, provided by different agents, gives an indication of the workload that an agent poses on the mental processing of the participant. The participants listened to messages belonging to one of four categories: (1) health, (2) leisure time, (3) household activities and (4) energy savings. We found that reaction times for responses to messages presented by the robot were not significantly different from those presented by the smart home, except for the health category. We also measured user experience using the Godspeed questionnaire [7]. We found that the participants liked the robot more (as indicated by the responses to the Likeability dimension of the Godspeed questionnaire) and that they considered it more alive (as indicated by the responses to the Animacy dimension of the Godspeed questionnaire) than the smart home. The patient’s preference of communicating through a robot over a smart home was confirmed in a field trial of the first prototype of the KSERA system in Vienna, Austria as described below.

2.4.6 Users Prefer a Larger Personal Space with Robot

For effective communication, the robot needs to approach a user without violating the user’s personal space. In a laboratory test, we measured the optimal distance and angle of approach in terms of the user’s experience [54, 55]. Participants were instructed to stop the robot at a comfortable distance for verbal communication by pressing a button. There were three different conditions: participants should stop the robot at the optimal distance, the closest comfortable distance, or the furthest comfortable distance. We found that a human’s personal space with respect to a robot is larger than with respect to another human and can be approximated with second-order polynomial functions.

2.5 Field Trials in Austria and Israel

We also conducted field trials to evaluate the complete KSERA system in a real-world environment with real end users. The field trials were carried out at two senior citizen homes, one in Schwechat, Austria and the other in Tel-Aviv, Israel. See [56] for the details of these field trials (e.g., method description, participant demographics, procedure). The key findings of the field trials are: the KSERA system and Nao robot are likeable; the attitude toward the Nao robot is highly correlated with the attitude toward the system; the Nao robot motivated the users to perform exercises; the Nao robot was preferred over a stationary touch screen for motivating the user to do a task, for raising attention, for motivating exercise, and for explaining a task; and a stationary touch screen was preferred over the Nao robot for providing and retrieving information.

2.6 User Experience Studies

User experience studies were conducted to validate the approach before the prototype was completed. These studies were conducted by the Istituto Superiore Mario Boella (ISMB) in Torino, Italy to assess the usability of KSERA as specified in ISO 9241 (i.e., efficacy, effectiveness, and satisfaction). These studies focused on the information generated by the ubiquitous monitoring system (UMS) and communicated to the user by the humanoid robot. The usefulness of available information, the clarity of the communication, and the persuasiveness of information provided by the robot were the interaction aspects evaluated from the user’s perspective during the usability tests.

2.6.1 Methods

In accordance with the guidelines on Human Centered Design of interactive systems [25], six Italian participants (2 men, 4 women; aged 56–93 years; mean age = 70) not involved in the project, tested the KSERA system at the ISMB e-Health laboratory in a simulated home environment. The sample was intended to represent young seniors, old seniors, and the oldest seniors. They were recruited with the following inclusion criteria: willingness to participate, cognitively intact, technologically literate, and suffering from a chronic but not debilitating disease (e.g., COPD (Chronic Obstructive Pulmonary Disease), hypertension, or diabetes). Five test scenarios were developed to allow the users to experience the main functionalities of the KSERA system. The test scenarios were: (1) receiving information about the outdoor environmental conditions; (2) home environment control through the KSERA system; (3) receiving support in medical measurements; (4) receiving support in doing exercises; (5) and making an emergency call. During the interaction with the system in the five scenarios, data were collected by using: audio-video recordings, situated interviews, thinking aloud protocols, and self-assessment questionnaires.

To measure the user’s awareness of the benefits and risks of KSERA, the participants were asked whether they totally disagreed, partially disagreed, partially agreed, or totally agreed with three statements beginning with “Systems like KSERA, which are able to check and manage the home environment, can be …” and ending with “… a source of savings”, “… helpful”, and “… a risk”. The interpretations of the savings, helpfulness, and risks were left up to the participant.

2.6.2 Results

The tests showed that the robot interface enhanced the usage and the interaction of the people with KSERA, improving acceptance of the system. The main reason for this appears to be that speech recognition simplified the interaction with the complex system features. Besides viewing the robot as harmless, the participants believed the robot offered reliable information about indoor and outdoor conditions. In their mental model, each system component they tried was logically interconnected to the others. They felt the personal support provided by KSERA in meeting their individual needs enabled them to pursue a healthy and active lifestyle.

User tests showed that people are aware of both benefits and risks of using KSERA. All of the participants agreed that systems like KSERA, which are able to check and manage the home environment can be a source of savings (Fig. 1, χ 2(3)=18, p<0.01); two-thirds of the participants agreed that systems like KSERA can be helpful (Fig. 2, χ 2(3)=3.3, p=0.34); and, all of the participants agreed that systems like KSERA are not a risk (Fig. 3, χ 2(3)=11.3, p=0.01).

Fig. 1
figure 1

Systems like KSERA, which are able to check and manage the home environment, can be a source of savings. 100 % partially agree. χ 2(3)=18, p<0.01

Fig. 2
figure 2

Systems like KSERA, which are able to check and manage the home environment, can be helpful. 17 % totally agree; 50 % partially agree; 33 % partially disagree. χ 2(3)=3.3, p=0.34

Fig. 3
figure 3

Systems like KSERA, which are able to check and manage the home environment, can be a risk. 83 % partially disagree; 17 % totally disagree. χ 2(3)=11.3, p=0.01

Participants understood that the system is able to autonomously and proactively control the environmental conditions, applying actions governed by static and predefined rules. Within the sample, 83 % of the participants stated they appreciated these capabilities. However, they preferred more interaction with the robot to understand and decide on the possible actions for the robot to take. They wanted to be asked for explicit confirmation before any action was taken by the system. Moreover, they wanted the ability to abort any activity that the system initiated. They also wanted the capability to modify the predefined rules.

Observations of the natural dialog between the participants and the robot during the tests revealed, in several different situations, a sense of comfort and trust by the participants. The natural dialog with the robot influences the evaluation and the acceptance of other system features, such as the environmental control. To determine if the main KSERA environmental control features were suitable, the participants were asked, “Which automatic environmental control would you accept at home?”. The Fig. 4 shows the preference distribution. In particular, except the first item of preference, which is a function already known and used in homes (control of heating by thermostats), these are new functions related to the improvement of the domestic and personal safety (e.g., check the main door or gas leaks). When asked in follow up interviews, the participants said they expect to manage these controls by interacting with the robot.

Fig. 4
figure 4

Automatic environmental control features

In interviews using open-ended questions, patients revealed they would like to see the following additional capabilities in KSERA: setting and receiving alerts about things to do (i.e., therapy, appointments), receiving a wide range of information from the external world (not only the environment), entertainment, making it easy to use dedicated devices (e.g., TV, CD, DVD player), and support for activities such as newspaper and book reading.

3 KSERA System

In this section we cover the KSERA system design, paying attention to how it meets the users’ needs.

3.1 System Architecture

The architecture depicted in Fig. 5 allows us to address the user needs by providing a framework for enabling independent living in smart homes through social and functional interaction with the robot. Looking at Fig. 5, we can see that KSERA potentially can meet the user needs through (1) a mobile robot that follows and monitors the health and behavior (i.e., joint attention as described below) of a senior (UN1, UN2, UN3, UN4, UN5, UN6, UN9), (2) audio-video and internet communication services including alerts to caregivers and emergency personnel and communication with family members (UN2, UN6, UN8, UN9), and (3) a robot integrated with smart household technology to monitor the environment and advise the senior or caregivers of anomalous or dangerous situations (UN5).

Fig. 5
figure 5

Overall KSERA architecture

The core element of the system is the KSERA intelligent server that allows connection and coordination of the three main functionalities of the system. It is composed of three main units: the robot controller, which is responsible for the realization of the robot’s behaviors, the rule engine, and three databases. The rule engine reads the environmental and medical sensor data stored in the personal data memory, interprets these values by reading the knowledge base, and triggers appropriate robotic behaviors which are stored in the HRI memory. Robotic behaviors are then realized by the robot controller which navigates the robot to the target place in the home environment and then uses the robot to interact verbally and non-verbally with the primary user.

The HRI is context dependent. As examples, the robot might need to report a medical measurement, warn the user to stop an activity because of abnormal external conditions (e.g., high pollution levels) or abnormal medical readings (e.g., low blood oxygen level), or interrupt the current activity because a family member or caregiver wants to communicate with the senior. This can only be realized by constant communication between environmental and medical sensors, call centers and care givers with the personal data memory, the rule engine, and finally the robot controller.

KSERA uses a number of modalities to make the interaction between the human and the robot more friendly and likeable, including speech recognition and generation, gaze, gestures, and imitating emotions. In the following sections, we further discuss the four parts of the architecture: smart sensing and smart activation (Sect. 3.2), intelligent server (Sect. 3.3), SAR mobile and interaction behaviors (Sect. 3.4), and audio-video communication with family members and care givers (Sect. 3.5). This is followed by an example showing how all of these parts work together to meet the user’s needs (Sect. 3.6).

3.2 Smart Sensing and Smart Activation

Remote care services for elderly persons rely on remotely acquired sensor data that should be available to care professionals to assist them in monitoring the patient’s health [1, 2, 30, 32, 38, 45]. The sensor data include personal data (e.g., heart rate, blood pressure), acquired by multi-sensorial wearable devices, and smart medical devices, and environmental data, describing surrounding indoor and outdoor conditions. The main personal data for COPD patients are SpO2 (oxygenation levels of blood) and FEV (air flow capacity) while the main environmental data for COPD patients are temperature, humidity, and pollution by fine particles (i.e., PM25, PM10, PM2,5).

Sensor data is stored in the personal data memory of the KSERA server (Fig. 5). The data are analyzed by the rule engine and interpreted by means of the knowledge base. This analysis determines what actions the system needs to take through its main interface: the robot.

Common systems to gather person-related data are sometimes invasive. Research studies show that ubiquitous monitoring becomes less invasive using wireless protocols such as Bluetooth/Zigbee. Progress in miniaturization and energy consumption of sensors make it easy to embed them in the environment [62]. Grounded on these findings, ubiquitous monitoring in KSERA relies on Bluetooth wireless communication as illustrated in Fig. 6.

Fig. 6
figure 6

KSERA sensory system

Certain environmental parameters are cheaper to acquire through external public services. One example is outdoor air quality (fine particles). In KSERA, the air quality parameters (i.e., PM25, PM10, and PM2,5) are acquired through web services on a daily basis in the early morning.

KSERA stores the sensor data in one common format XML/RDF. This allows semantic web applications using Web Ontology Language to further analyze the data trends. The raw data are collected and can be used by different agents, such as doctors, caregivers, and the patients themselves. The system uses the information to make decisions about what activities the robot should perform. In addition to directing the robot to perform various activities, the system can control the smart home directly (e.g., turn a light on or off, change the thermostat). For KSERA, we used the eHome actuators as illustrated in Fig. 5 [37].

Actuators also allow the robot to initiate and accept video calls from secondary users (e.g., family, friends, and medical professionals). The video is displayed by the robot with the LED video projector. This allows the elderly to establish contact with the outside world without having to get up and move to a video display screen.

3.3 Intelligent Server

The intelligent server contains a rule engine, which is a set of rules that determines what the system does when an event occurs. Events include changes in sensor data, requests from the user, and time (e.g., it is time for the senior to perform exercises or dispense medications). The rules were determined by domain experts. The rules are stored in parameter tables and thus can be easily customized for each user’s unique case of COPD or changed easily, as the expert knowledge changes. The parameter tables are constructed as IF-THEN clauses where IF a specified event occurs, THEN the system takes a specified action. The IF clause specifies the event as a combination of conditions (e.g., outside air PM10 count exceeds 50 for more than 20 days in a row) that trigger a specified action when an event occurs. The THEN clause specifies the action the system takes (e.g., warns the patient not to go outside).

As an example of how the rule engine works, let us look at air quality and its effect on COPD patients. Medical guidelines specify that it is unhealthy for COPD patients to go outside when the outside air PM10 count exceeds 50 for more than 20 days in a row. Thus, if KSERA detects that the outside air quality is unhealthy, it should warn the patient not to go outside. This is implemented in the rule engine as follows. First, the ubiquitous monitoring system (UMS) detects that the air quality outside is unhealthy through trend analysis (Fig. 7).

Fig. 7
figure 7

Example of rule engine detecting unhealthy PM10 levels according to medical guidelines

The patient’s intention to venture outdoors could be detected by the smart home sensing the coat closet had been opened and the Nao robot asking if the patient intended to go outside. Once the system detects that these two conditions exist (i.e., bad air quality and the patient is planning on going outside), the rule engine decides what action the robot will take. In this case, according to medical guidelines from domain experts, the action is to warn the patient in a friendly way to shorten or even avoid going outdoors.

3.4 SAR Mobile and Interaction Behaviors

3.4.1 Mobile Behavior

In the majority of the scenarios the robot needs to reach the user or a position in the home environment, and the mutual positioning between the user and the robot influences HRI. In order to realize robot mobile behavior, the KSERA system simultaneously localizes the user and the robot. As the Nao’s camera resolution is too low and often occluded in domestic environments, we chose to realize localization by means of a ceiling-mounted camera. This way the entire room is visible in one stable image, the camera is not occluded and it is simple and cheap to implement.

The localization algorithm consists of two parts: (1) person localization with a hybrid probabilistic model and (2) robot localization based on particle filter prediction [6567]. The person localization is designed for (but not constrained to) tracking only a single person because the KSERA system is generally not required when a visitor or care-giver is visiting the person.

Navigation uses the position information (robot’s and person’s poses) from the localization module and a model of the human’s personal space as inputs for determining where to move the robot. The human’s personal space with respect to the Nao robot is larger than with respect to another human and can be approximated with second-order polynomial functions [54, 55]. The choice of the target position and of when to start a navigation behavior is made by the rule engine that interprets the environmental parameters and external events and triggers the appropriate robotic mobile behaviors. For example as shown in Fig. 8, after the robot has reached the target human and the human has performed a measurement with the pulse-oximeter, the rule engine interprets the measurement and according to its value, sends the robot to the projection position for showing health exercises or it starts a video connection with the care-giver.

Fig. 8
figure 8

Unified Modeling Language (UML) activity diagram that depicts the typical flow of the robot’s mobile behavior

The architecture of the KSERA system allows the integration of multiple navigation algorithms. In particular we have developed two state-of-the-art navigation models: behavior-based navigation and map-based navigation for solving problems in different situations. The behavior-based navigation provides a smart obstacle-avoidance strategy which allows the robot to escape from a U-shaped trap using only two forward directed sonar sensors, and the map-based navigation method learns the spatial knowledge by observing the human’s movements in a room, and controls the robot’s navigation behavior with sensorimotor representation trained during map building [5355, 64, 67].

3.4.2 Dialog Management System

Humans use gestures and facial expressions along with speech to communicate. Thus, it seems natural that the Nao should use gestures and facial expressions when communicating with the human. The Nao’s face is made from hard plastic, so it cannot be used for facial expressions. However, the eyes of the Nao are composed of a black pupil surrounded by LED lights, which we used instead of facial expressions. We developed a dialog state machine to combine dialog with gestures and eye LED signals. The dialog state machine and its components are described in the following paragraphs.

Speech Recognition and Generation

For speech recognition, we use Sphinx 4.0 from Carnegie-Mellon University [60], which is an automatic speech recognition system (ASR) that uses Hidden Markov Models. We use Sphinx to recognize simple answers from the user such as yes and no, but it can easily be expanded to understand more complex utterances. A ceiling-mounted microphone is used for speech recognition as the quality of the Nao robot’s own microphone is currently too low for practical indoor applications. Nao has a built-in text-to-speech converter, which the robot uses to produce utterances.

Gestures

People often accompany speech with gestures [5]. Therefore, it seems reasonable that HRI should also include gestures. Human gesticulation varies from culture to culture. Gestures in one culture can mean something different in another. No consensus has been reached on what gestures are appropriate for robots to use. We chose gestures that we felt would be recognized by European senior citizens. The Nao can make the following gestures to accompany what it says: to get the person’s attention, Nao waves both arms; when asking the person to do something, Nao twists its right arm and shows the palm of its right hand to the person; when it does not know something, Nao shrugs with its palms up; when asking a question, Nao twists both arms and shows the palms of its hands to the person; to show happiness, Nao nods its head up and down once; and to show sadness, Nao shakes its head right to left once.

Imitating Emotions

Different colored LED patterns around the eyes are used to imitate human emotions when the Nao robot speaks. The LED patterns were developed from the results of two experiments [27]. In the first experiment, 81 subjects were asked which color and intensity best fit each of six emotions: anger, disgust, fear, happiness, sadness, and surprise [18]. Then they were asked which of three pairs of lines were best associated with the given emotion. The three pairs of lines each respectively differed in frequency, sharpness, and orientation. To provide a common reference point, the subjects were shown standard images used for the different emotions [19]. This information was used to design two LED patterns for each emotion: (1) flashing lights with varying periods and rise/fall times and (2) cartoon patterns (e.g., tear drops for sadness). In the second experiment, the subjects were asked to identify which of the six emotions each of the twelve LED patterns represented. For second experiment, 98 subjects were used. The subjects ranged in age from 14 to 51 years, none were color blind, none knew the purpose of the study, and all had a chance to win a gift coupon to encourage participation. The research was conducted through an internet survey. The subjects were shown short video clips of the Nao robot head displaying one of the twelve LED patterns. The patterns were shown in a random order. After each clip, the subjects were shown a display with slider bars for each emotion and asked to what extent Nao was expressing each of the emotions in the clip. They were told that moving the slider to the left side meant Nao was not expressing this emotion at all and moving it to the right side meant Nao was expressing this emotion a lot. The subjects could use the slider bars to show that Nao was expressing more than one emotion. The results showed that the Nao robot can use LED patterns to imitate emotions. However, the LED patterns that are recognized as an emotion were not necessarily the ones we expected. Overall, it appears that cartoon patterns are more recognizable than flashing lights with varying periods and rise/fall times. Ekman and Friesen [18] proposed that humans recognize emotions in other humans by observing their facial expressions. Although they proposed that certain facial expressions were universally recognized as specific emotions, others have disagreed [35]. It is well-known that people respond faster to consistent cues. To that end, we created three LED patterns for the emotions: surprise, happiness, and sadness. A fourth pattern of blinking white LEDs was used to portray no emotion.

Dialog State Machine

As shown in Fig. 9, the dialog state machine has four states: START, DO_QUESTION, MESSAGE, and GET_ANSWER. The START state initializes the link with the robot and reads the dialog file, which contains the text of the questions to ask and the messages to deliver. It also contains the logical sequence of how to proceed from one question or message to another based on the answers that are given. Furthermore, it contains the accompanying gestures and facial LED patterns. Once the initialization is done, the START state reads the next entry in of the dialog file and determines whether a question should be asked or a message delivered or that the state machine should end.

Fig. 9
figure 9

Dialog state machine

The DO_QUESTION and MESSAGE states take care of sending the proper commands to the robot so that the speech, gestures, and LED patterns are performed in the correct way. The MESSAGE state either succeeds and goes back to the START state, or it fails and the state machine ends. If a question was asked, the robot first needs to listen for a reply, so the DO_QUESTION state proceeds to the GET_ANSWER state. The speech, gestures, and LED patterns are played back in parallel, starting at the same time. The GET_ANSWER state invokes the Sphinx ASR and waits for a reply. If the answer is not intelligible the question is repeated up to three times. As soon as a proper answer is given, the state machine proceeds to the START state with the reply.

3.4.3 Eye Contact and Joint Attention

Gaze has a regulating function for turn-taking during conversation. It is important to know when it is one’s turn to talk, or when another is finished talking. Kendon [28] found that for approximately 80 % of the time people tend to end their turn by looking at the other person. The other person when taking his or her turn will look away from the turn-giver approximately 80 % of the time. Although the data varied strongly across individuals, Kendon [28] also found that most people gaze away most of the time when talking, whereas listeners predominantly gaze directly at the speaker. The reason people tend to gaze away while talking has, as for taking turns, a cognitive reason. When speech is less fluent it is more common to look away to minimize the visual cognitive load and use all resources for finding the right words and structure. For HRI, appropriate mutual gaze is therefore important.

This requires face detection, for which we use Nao’s built-in algorithm [3]. Face tracking is active during navigation so that the robot looks at the person while it is approaching. This provides much better non-verbal feedback about what the robot’s intentions are, i.e., approaching the person. As soon as a face is detected the robot’s eye LEDs blink green providing further feedback about what the robot is doing.

As soon as a face is detected the robot estimates the person’s head pose relative to itself. In particular it detects whether the person is looking at the robot [57, 58, 64]. Ideally, one might want to estimate the gaze direction including eye turn and not just head turn. However, the camera resolution and refresh rate make this impossible, especially because the person is typically much further away from the robot (about 1.5 m) than during a typical webcam application (about 50 cm). It turns out that this is not a problem in practice because people do not usually look at the robot from the corners of their eyes.

Because continuously looking at the user might elicit an uncomfortable feeling, Nao needs to be able to look away from a person at times during the interaction. In the current implementation, Nao looks away after 50 frames of successfully tracking the face. To direct Nao’s gaze away, we mimic the fact that in human perception, motion easily attracts visual attention. This behavior is implemented as follows. We start by determining the distribution of movement within the image collected from Nao. The difference between successive frames is used to determine the edges and corners of objects that are moving. A moving object is found when 30 or more “good features to track” are found in the image. “Good features to track” are points in the image that are edges or corners. Older “Good features to track” are discarded and new ones are created every frame. These “Good features to track” are looked for in every new frame and this allows us to determine the new position of the object. The minimal movement that is allowed to keep a point in a list is set to 1.5 pixels. Every point has a weight associated with it, and this weight is altered according to the movement compared to the minimal movement of 1.5 pixels. The weight together with the total existence time (maximum existence is 15 frames) determines whether a point (good feature to track) is allowed to remain or should be removed from the list. With this algorithm, larger objects that are moving faster and objects with more edges and corners are more likely to be detected. In this way, the Nao averts its gaze by momentarily looking at a larger, faster moving object in the visual field around the person’s face (e.g., a bird flying in the window behind the person).

3.5 Video Communication with Family Members and Care Givers

We equipped the Nao robot with a LED video projector (a.k.a., Beamer) to display text, graphics, and video for the user on a wall [17]. Examples of video output are playing a pre-recorded video or a video phone communication. Figure 10 shows an example of the Nao using the small LED video projector mounted on its back and its head camera for video communication between the user (sitting on a couch) and a call center operator.

Fig. 10
figure 10

Example of Nao facilitated video communication

Workshops with secondary users (i.e., external experts from the care domain) were held at the Vienna University of Technology to assess the overall KSERA system and in particular the LED projector capabilities. The experts rated the quality of the audio and video connections as good and sufficient for emergency scenarios and for social communications. The added value of the KSERA mobile video communication was clearly seen in comparison to state-of-the-art video communication (e.g., via Skype™). The possibility to have a quick assessment of a possible emergency was seen as very useful. The experts see definite added value of the system as it provides a means to improve an older person’s integration into a social network.

Other projects have used touch screen terminals for this purpose [34, 39, 41, 59], but this is not appropriate for KSERA as the robot is not strong enough to carry standard terminal equipment and is not tall enough to display it using smart phones or PDAs at a suitable height for a sitting or standing user [44].

3.6 KSERA Example

The sensor data collected by the UMS is used to compute intelligent, proactive, and context-sensitive care actions. An example is exercise. COPD care protocols set the difficulty level of exercises based on the patient’s SpO2 levels. Let us assume that the KSERA rules engine has determined that it is time for the patient to perform exercises. First the system needs to get the patient’s current SpO2 levels, so it can determine the difficulty level of exercises to ask the patient to perform. As described earlier, the localization software locates the robot and the patient in the room and then the navigation software moves the robot to the patient. The robot approaches the patient using the proxemic behavior discussed earlier and stops before violating the patient’s personal space. After stopping, it uses the eye contact and joint attention behavior discussed earlier to get the patient’s attention. Then, it uses speech accompanied with gestures and imitated emotions to ask the patient to read the SpO2 value using the pulse-oximeter. Then, the robot uses speech recognition to understand the patient’s answer. The rule engine will determine the robot’s next action based on the patient’s response. Let us assume the patient agrees to use the pulse-oximeter. The pulse-oximeter reads the patient’s SpO2 value and sends the value to the UMS over a Bluetooth connection. The rule engine will decide the difficulty level of exercises based on the patient’s SpO2 levels. Then, the rule engine will direct the robot to lead the patient in the exercises. The robot will then use speech, gestures, imitated emotions, and arm movements to lead the patient in performing the exercises.

4 Discussion and Conclusions

The KSERA project achieved the goal of integrating smart home technology and socially-assistive robots to allow the extension of independent living for elderly people. The social robot is the most visible component of the system playing the role of communication interface between the elderly, the smart home, and the external world. Other projects have integrated SARs with smart home technology to provide assistance to elderly users, but none have used a humanoid robot. For example, ExCITE enhanced a robotic platform, Giraff, for telepresence with features enabling social interaction from a domestic environment to the outside world [12]. Giraff is a remotely controlled mobile, human-height physical avatar integrated with a videoconferencing system (including a camera, display, speaker and microphone). It is powered by motors that can move it in any direction on its wheeled base. The Nao robot, on the other hand, is a completely autonomous bi-pedal humanoid robot.

To ensure user acceptance, we enhanced Nao’s behavior with person- and self-localization abilities, person-aware navigation, speech recognition and generation, robot gestures, emulated emotions, eye contact and joint attention, and audio-video communication with family members and care givers. We conducted a number of laboratory tests to determine the best way to implement these enhancements. User experience studies and field trials verified that these enhancements did indeed improve user experience and the user acceptance of Nao. The results showed that (1) the KSERA system and Nao robot are likeable, (2) the attitude toward the Nao robot is highly correlated with the attitude toward the system, and (3) communication through a robot is preferred over interaction with the individual technical elements of a smart home.

Like in so many robotic projects the implementation of certain functionalities of the robotic platform requires innovative technologies. In KSERA this was no different. The robot needed to be cheap and able to navigate in domestic environments without expensive laser-range finders, which resulted in new person-aware and map-free navigation algorithms. What is different is the non-functional requirements. People have expectations about robots in terms of their social behavior and not just their functionality. In KSERA this was realized by implementing eye contact and validated gestures to accompany speech. If anything, the results from KSERA show the huge impact of these non-functional social behaviors on user experience and user acceptance.

Techniques to overcome several limitations of the Nao robot have been developed: off-board systems augment its on-board capabilities, a ceiling-mounted camera provides user and robot localization, a rule-based intelligent server controls the behavior of the robot based on medical and environmental information, medical and smart home sensors gather information about the user, external sources gather outside environmental information (e.g., pollution counts and weather), domotics control appliances in the smart home, a LED video projector mounted on the robot displays text, graphics, and video for the user on a wall, a ceiling-mounted microphone and off-board ASR recognize speech, and an off-board processor controls the dialog, gestures, eye LEDs, and gaze of the robot.