Using socially assistive robots for monitoring and preventing frailty among older adults: a study on usability and user experience challenges

Socially assistive robots can play an important role in the monitoring and training of health of older adults. But before their benefits can be reaped, proper usability and a positive user experience need to be ensured. In this study, we tested the usability and user experience of a socially assistive robot (the NAO humanoid robot) to monitor and train the health of frail older adults. They were asked to complete a set of health monitoring and physical training tasks, once provided by the NAO robot, and once provided by a Tablet PC application (as a reference technology). After using each technology, they completed the System Usability Scale for usability, and a set of rating scales for perceived usefulness, enjoyment, and control. Finally, we questioned the participants’ preference for one of the technologies. All interactions were recorded on video and scrutinized for usability issues. Twenty older adults participated. They awarded both technologies ‘average’ usability scores. Perceived usefulness and enjoyment were rated as very positive for both modalities; control was scored positively. Main usability issues for NAO for these tasks were related to speech interaction (e.g., NAO’s limited speech library, NAO’s difficulty to cope with Dutch dialect), older adults’ difficulties with taking their proper role in human-robot interaction, and a lack of affordances of NAO. Seven participants preferred NAO: it was easier to use and more personal. Social robots have the potential to monitor and train the health of frail older adults, but some critical usability challenges need to be overcome first.


Introduction
The first law of robotics, as taken from Asimov's famous novel I, Robot, states that Ba robot may not injure a human being or, through inaction, allow a human being to come to harm^ [1]. Within the healthcare context, social robots are usually expected to do a bit more and are used to increase the health of human beings. Social robots, in this context, are referred to as socially assistive robots: robots that aim to foster Bclose and effective interaction with a human user for the purpose of giving assistance and achieving measurable progress in convalescence, rehabilitation, learning, etc^ [2]. Such robots can be used in a wide range of tasks. For elderly care, they are used to, for example, bathe people, provide companionship, monitor health, and monitor falls [3]. And the first studies that delved into the effectiveness of using social robots for these goals show positive effects [4,5]. However, a myriad of factors act as prerequisite for successful acceptance of socially assistive robots among older adults and need to be accounted for during design and implementation, including ease of use, enjoyment, and controllability [6,7].
Frailty, a situation in which a person (most often an older adult) is Bat increased risk for future poor clinical outcomes, such as development of disability, dementia, falls, hospitalization, institutionalization, or increased mortality^ [8], may lend itself excellently for using socially assistive robots. As frailty is made up of many dimensions (such as decline on the physical or cognitive condition, or malnutrition [9]) which deteriorate gradually, and whereby the older adult does not notice the development of an unhealthy situation, identification of frailty is important to prevent the negative consequences of being frail. And in a situation where frailty or the first signs thereof have become a reality, a person's health needs to be closely monitored, and health training and education need to be provided. Since frailty is a quite recently discovered but highly prevalent phenomenon (the percentage of community-dwelling adults showing the first signs of frailty range between 30.4% to 44.9% in ten European countries while the frail group ranged from 1.3% to 5.9% [10]), social robots may be an engaging, cost-effective means to monitor and train the health of older adults who live in caring homes (and where frailty can be considered to be highly prevalent [11]).
Before a socially assistive robot can be helpful in frailty care, proper usability and a positive user experience need to be ensured. In this article, we report on a study that aimed to uncover the usability and user experience issues that socially assistive robot design needs to overcome in order to be an effective and well-accepted means among older adults for identifying and monitoring frailty and for providing health training. We also determined older adults' acceptance of both technologies. The results of this study will allow us to understand what hinders effective human-robot interaction among older adults that use this technology for health purposes, and what influences their decision to use them. For policy makers and robot designers, such information is crucial when deciding whether or not to use social robots for frailty screening, monitoring and prevention, and how to design such technology in order to optimize usability and the user experience.

Theoretical background
Human-robot interaction can be perceived from either the robot's or human point of view. One can perceive the robot to be an entity by itself, and human-robot interaction serves to fulfil the robot's needs (in which case needs are pre-programmed by the design team, and can be, for example, the need to extract knowledge from a human in order to complete a user model). When considered from the human point of view, human-robot interaction focuses on how a robot can complete a task in an acceptable and comfortable manner [12]. The second interpretation includes two aspects. On the one hand, it highlights the concept of acceptance, which is often studied by means of the Technology Acceptance Model [e.g., 13] or the Unified Theory of Acceptance and Use of Technology [e.g., 14], and deals with the identification of factors that explain the intention to use a robot, such as the aforementioned ease of use, enjoyment, and controllability. One the other hand, it highlights the concept of usability.
Usability is defined as Bthe effectiveness, efficiency and satisfaction with which specified users achieve specified goals in particular environments^ [15]. Usability engineering has a long tradition in product design and human-computer interaction. This tradition has resulted in a rich methodological toolkit that supports researchers in identifying issues that hinder good usability, such as thinking aloud (whereby an enduser is asked to interact with a technology while constantly voicing his/her thoughts out loud) [16], heuristic evaluation (whereby evaluators are asked to judge an interface with the aid of a set of design guidelines) [17], and cognitive walkthroughs (in which experts are asked to 'walk through' a computer system or website as a normal user and report any shortcomings) [18]. Besides methods to elicit usability issues, there is also a range of tools to benchmark the usability of a given technology, of which the System Usability Scale (a ten item questionnaire) [19] is the most widely used one. Usability tests of social robots or socially assistive robots are quite hard to find in the scientific literature. Fischinger and colleagues [20] tested the usability of a socially assistive robot that aims to prevent falls, and to detect and handle emergencies among older adults and identified several improvements that needed to be made so as to improve switching between input modalities. Other studies were mainly designed to assess usability metrics (like task completion time and number of errors) [21]. However, it has been stated that it is difficult to define a set of common metrics and instruments to assess the quality of human-robot interaction, as the range of robots with which people can interact is incredibly diverse [22,23].
The user experience is a concept has recently gained a lot of attention in research and design. It deals with the cognitive, socio-cognitive and affective aspects a person experiences while interacting with a product or technology, like enjoyment, aesthetics, and a desire for repeated use [24]. So, where usability focuses on the pragmatic qualities of a technology or product, the user experience is concerned with its hedonic qualities and people's reactions after a period of usage [25,26]. A variety of factors that potentially contribute to the user experience of interacting with a social robot have been explored in previous studies. Examples include the congruence between robot and end-user personality [27], empathy [28], and appearance [29]. However, a well-researched and widely accepted model of the user experience of social robots is currently lacking. The experience of users while interacting with a social robot can best be assessed by using a combination of qualitative data collection during interaction (such as thinking-aloud) and a quantitative post-interaction data collection method (such as interviews or questionnaires) [30].

Method
We assessed the usability and user experience of a socially assistive robot by means of observing older adults interacting with a robot, programmed to question their frailty status and to explain physical exercises. Afterwards, they were questioned about acceptance, usability and a set of user experience factors. In order to have a reference point for interpreting the results, we asked them to perform the same screening and exercise tasks by means of a tablet PC application.

Participant recruitment
We recruited 20 older adults, aged 70 years or older via an organization for elderly care in the region of Twente in the Netherlands. Participants needed to speak Dutch fluently and they were excluded from participation when they had physical impairments that posed a safety risk for doing the exercises as was being instructed by the socially assistive robot or Tablet PC application. Finally, a participant needed to be either frail or pre-frail (a state in which the first signs of frailty are present), so that administering the screening instrument and providing physical exercises would make sense. This verdict was given by the care team of a potential participant.

Procedure
Each individual test was started by explaining the goal of the evaluation to the participant, after which basic demographics were assessed. Then, a participant was asked to interact with the socially assistive robot or the tablet PC application, which were offered in random order. Both technologies provided a module to monitor the frailty status of the older adult, as well as a module to instruct older adults in doing physical exercises (as deterioration of the physical condition is an important aspect of frailty, and physical exercises are an important part of frailty treatment [31]). The monitoring module consisted of the SARC-F, a questionnaire to screen for sarcopenia [32]. The exercising module consisted of four physical exercises, taken from the OTAGO program (which aims at preventing falls by improving physical strength) [33]. They are: (1) Stretching for shoulder, (2) Walking and turning around, (3) One leg stand (no support), (4) Knee bends (hold support). After completion of both modules, the participant completed a set of usability and user experience questionnaires. Then, the participant interacted with the other technology, performed the same tasks, and completed the same questionnaires (albeit for the different technology). At the end of the session, we asked whether the participant preferred the socially assistive robot or the tablet PC application, as well as their rationale for this choice.

Technology
The socially assistive robot we used during the tests was the NAO humanoid robot (SoftBank Robotics). NAO was programmed to read aloud and interpret the monitoring questions belonging to the SARC-F questionnaire, and could perform the four different physical exercises, after which the robot asked the participant to repeat each exercise. See Fig. 1 for a photo of the NAO robot. The tablet PC which we used was a Samsung TabPRO (SM-T520), with a screen diameter of 10.1 inch.. The questionnaire and exercises were provided in a web-environment, opened within Firefox. This webenvironment was optimized for use for older adults (e.g., large buttons and fonts were used), see Fig. 2.

Data collection
At the start of each session, we interviewed participants about demographics (gender, birth date, living situation, cognitive and physical impairments). After interacting with each technology we assessed the usability of the technology (by means of the System Usability Scale, the preferred method for assessing this factor (SUS) [26,[34][35][36], perceived usefulness via three statements and a five-point Likert scale [37], and two user experience factors: Enjoyment and control. Enjoyment can be defined as Bthe extent to which the activity of using the device is perceived to be enjoyable in its own right, apart from any performance consequences that may be anticipatedF ig. 1 NAO robot [38], while control is Bthe extent to which a user can bring about or prevent particular actions or states of the system if she has the goal of doing so^ [39]. The Enjoyment scale, was assessed by means of a five-point semantic differential scale, was based upon van der Heijden [40], while the Control scale, assessed via a five-point Likert scale, was based upon van Velsen et al. [41].
Via a short interview, a participant's preference for either NAO or the tablet PC and their reasons for this preference was questioned. A voice recorder was used to record the conversation. All tasks, performed with NAO and the tablet PC application were recorded on video.

Data analysis
Demographics were analyzed in descriptives. For the remainder of the analyses, results were split in results for NAO and results for the Tablet. The SUS was analyzed following the standard method. The perceived usefulness, enjoyment and control scales, were analyzed on a per-item and scale basis (mean score and standard deviation). Responses on the enjoyment scale were recoded for easier interpretation: higher scores now denote a positive evaluation of enjoyment while using the Tablet or NAO. Paired-samples t-tests were used to test for significant differences between the scale averages. Video and audio recordings of the participants interacting with NAO and the Tablet were scrutinized for usability issues (issues that hinder effective use, efficient use, and/or user satisfaction). A rehabilitation physician assessed whether or not the physical exercises that were performed during the test were done correctly. Based upon van Velsen et al. [42], issues were provided a severity rating by means of the following rules: & Critical problems prevented participants from completing tasks and/or recurred across all participants; & Serious problems severely increased the task completion time and/or recurred frequently across participants. However, a serious problem did not prevent a participant from completing the task eventually; & Minor problems increased task completion time slightly and/or recurred infrequently across the evaluation participants. Finally, a minor problem did not prevent the evaluation participants from completing a test task easily. The amount of critical, serious, and minor problems were compared between NAO and the Tablet. Finally, short interviews recordings were transcribed, preferences were determined, and reasons for preferences were thematically grouped.

Ethics
As this study was conducted to identify problems with different technologies for monitoring frailty and providing health training to reverse or postpone the development of frailty, but the participants volunteered to take part and were not forced to answer questions or conduct exercises, approval from a medical ethical committee was not necessary [43]. Before participation, participants were send an information package by the research team. Before starting a test, the participant completed an informed consent form.

Participants
Twenty older adults (12 males, 8 females) participated. They had a mean age of 78.5 ± 7.1 years. Fifty percent of them lived alone. No one suffered from cognitive impairments. Some participants suffered from small physical impairments but this did not pose a safety risk for doing the exercises.

Quantitative measures
After interacting with each technology, participants completed a survey with the System Usability Scale (SUS), and rating scales that assessed perceived usefulness, control, and enjoyment. Results of this survey can be found in Table 1. For the user experience scales (perceived usefulness, control, and enjoyment), scores range from 1 (lowest) to 5 (highest).
The numbers show that, following the interpretation of SUS scores by Sauro and Lewis [44], the usability of both NAO and the Tablet PC application score below average (with scores that represent a grade D). Figure 3 discloses however, that opinions about the usability of both technologies differed. Some had a positive, and some a negative opinion on this point, with one participant being extremely negative about NAOs usability. The scores for perceived usefulness and enjoyment are very positive for both technologies, while the score for control is positive for both NAO and the Tablet PC application. In these cases, opinions also differed, but the majority of the participants was positive about these factors for both technologies (see Fig. 4). T-tests showed that there were no differences between the Tablet PC application and NAO with respect to the SUS score (t(19) = .585, p = .566), and the average score for perceived usefulness (t(19) = .64, p = .53), control (t(19) = 1.01, p = .33), and enjoyment (t(19) = .27, p = .79).

Usability issues
The usability issues that we identified can be divided over three categories: issues that hinder proper monitoring, issues that hinder proper exercising, and finally, general issues (i.e., usability issues that occur in both modules). Table 2 shows usability issues we identified during our viewing of participants using of NAO and the Tablet PC application, and that are not specific towards monitoring or exercising. For every problem, we noted how often it occurred during all interactions, how many participants experienced this issue, and what priority level we assigned to it. As you can see, an issue could occur multiple times within a single session. There were three general, major usability issues that occurred when participants interacted with NAO. All three were related to the human-robot interaction using speech. First and foremost, they answered too soft for NAO to hear them (correctly). Second, the participants were unable to hear NAO, and third, they answered too soon (i.e., before NAO completed its speech, so that the participant's answer could not be rightly interpreted by the robot). With regard to interacting with the Tablet PC, we identified one general, critical usability issue, namely that participants had problems with touching the tablet's touchscreen correctly.

Issues hindering proper monitoring
A specific set of usability issues was identified when observing participants interact with NAO or the Tablet PC application for the purpose of monitoring health (for which they completed the SARC-F questionnaire). An overview of these issues is presented in Table 3.  The most frequently encountered problem when interacting with NAO was that the robot was unable to understand the answer given to it. This could be due to the fact that the answer the older adult provided could not be interpreted by NAO as it was not included in its speech library or that the older adult spoke with a Dutch, regional accent that NAO could not understand. On the other side, older adults could often not understand NAO or did not know what to answer (for example, when, for them, NAO talked too quickly after which they could not remember the different answering possibilities). When interacting with the Tablet PC, only one critical issue surfaced. Here, a part of the older adults was unable to select the right answering possibility. They simply did not know they could press an answering option on the screen, pressed too hard or too long (which caused the tablet to malfunction), or pressed an answering option with their fingernail, which the tablet did not recognize as the selection of an option.

Issues hindering proper exercising
The problems that occurred during exercising (as instructed by NAO or the Tablet PC) can be divided into two categories: problems that occurred over all exercises and exercise-related problems. Table 4 displays the general problems that occurred during exercising. It shows that problems related to a proper distribution of roles (between instructor and the one being instructed) arose quite frequently. This was mainly the case for NAO, but was also observed while older adults interacted with the Tablet PC. Older adults did not understand that NAO first instructed an exercise, did not understand they should repeat an exercise (observed for both NAO and the Tablet PC), or did not understand they should touch NAO's head when done exercising.
We identified a very wide range of exercise-related problems of which we will only discuss the critical and serious ones. With respect to exercise 1 (Stretching the shoulder), participants did not use a wall (serious issue; 8 participants; NAO only), stood too close to a wall (serious issue; 2 participants for NAO; 5 participants for the Tablet PC), did not lift their arms high enough (serious issue; 1 participant; Tablet PC only), or did not keep their arms above their head for the full 10 s (serious issue; 2 participants; Tablet PC only). For the case of exercise 2 (Walking and turning around), in which participants were instructed to walk in the shape of an 8, we observed many erroneous executions. Participants exactly copied NAO and walked with very small steps (critical issue; 11 participants; NAO only) or made a zigzagging movement (critical issue; 8 participants; NAO only). Other erroneous executions of this exercise included making only a few steps (critical issue; 3 participants; NAO only), walking from left to  For exercise 3 (standing on one leg (without support)) and 4 (bending the knee (with support)), we identified only one critical or serious issue per exercise. During exercise three we observed that participants copied the movements of NAO exactly and dragged their feet over the ground (serious issue; 3 participants; NAO only), and during exercise 4, we observed that people did not place their hands on the table (serious issue; 5 participants; NAO only).

Preference for NAO or tablet PC application
At the end of each session, participants indicated whether they preferred using NAO or the tablet for monitoring or training their health. Overall, 13 participants preferred the tablet and 7 participants preferred NAO. Reasons that participants gave for preferring the tablet were: • The Tablet PC is easier to use (mentioned 5 times).
• The videos in the Tablet PC application make exercising easier, as they are explained more clearly (mentioned 4 times).
• Practicing with NAO is hard if you have a hearing impairment (mentioned 3 times).
• The tablet is smaller than the robot (mentioned 2 times).
• I am already very used to working with a tablet (mentioned 2 times).
• I am not used to working with NAO (mentioned 1 time).
• NAO is something for the future (mentioned 1 time).
Participants who preferred using NAO supplied the following arguments for their preference: • It is easier to use the robot (mentioned 5 times).
• NAO is more personal, it can be a buddy (mentioned 2 times).
• NAO has no loading time, it is faster (mentioned 1 time).
• NAO shows how to do the exercises (mentioned 1 time).
• NAO tells you how to do the exercises (mentioned 1 time).

Discussion
In this study, we assessed the different usability and user experience challenges that social robots face when being used to monitor and train the health of frail older adults. With respect to usability issues, we found that both the social robot that was being used (NAO), as well as the Tablet PC application (which was tested as a reference technology) suffered from several usability issues. Participants had difficulty with interacting with the social robot as a) they experienced problems while talking with the robot, b) they found it difficult to identify their role during human-robot interaction, and c) did not have a clear image of the relation between the possible interaction options the social robot provided and the consequences of using one of these options. From observing the interaction between the social robot and older adult, it became clear that the social robot was not technically capable of having a full and rewarding conversation with an older adult. Older adults answered too soon, too softly, too late, or were not able to hear the social robot. The social robot, on the other hand, had difficulty with interpreting the speech of the older adults, which was often in Dutch dialect or included words NAO could not process. Before social robots can be successfully used to monitor and train the health of pre-frail older adults, their speech library, speech, and flexibility for coping with answers that are provided before or after the social robot expects it, should be thoroughly improved. This need was also identified by [45] who found that the hardware of the NAO robot and its speech library are not of such a quality that they provide good speech interaction. Next, older adults found it difficult to find their role while interacting with the robot: what is expected of him or her? During our tests this manifested itself as copying exactly what the social robot does (even if this leads to unnatural movements that hinder the effectiveness of physical exercises) and unsuccessful switching between roles (i.e., the robot first instructs after which the older adult performs an exercise). Social robots are not a common conversation partner or instructional agent for older adults, which makes it difficult for them to determine how they should interact with a robot and how they should interpret their instructions. Introduction of social robots to support health monitoring and physical exercising should therefore always be preceded by a period of thorough instruction and a trial period, so that older adults can get accustomed to the interacting with robots and know 'how to play the game'. Finally, older adults found it difficult to identify interaction options and to understand the consequences of using such an action (i.e., the affordances [46] of the social robot). This manifested itself most prominently in problems when older adults needed to touch the head of the social robot to continue the physical training program: it was an unnatural action and not something they immediately associated with completing an exercise. Social robot design can therefore best refrain from using such interactions methods and focus on a properly working speech interaction or interaction via a touchscreen (as, for example, the Aido Interactive Personal Home Robot by InGen Dynamics allows). This claim is strengthened by the findings of Hebesberger and colleagues [47] who found that when introducing an autonomous robot, its functionalities should be self-explanatory so that the robot can be used without help from a care professional or support staff. The presence of these usability issues was also reflected in the score the robots received on usability (assessed via the System Usability Scale (SUS)), which could be interpreted as unsatisfactory. This score was similar to the SUS score of the Tablet PC application that was used as a reference technology. These results also suggest that the SUS is indicative of social robot usability, even though the instrument was not developed for assessing this type of technology initially. The user experience scores that were awarded to the social robot (split out into the factors perceived usefulness, control, and enjoyment) were very positive and equal to the scores that were awarded to the Tablet PC application that was provided as a reference technology. Moreover, a third of the participants in our study (7 out of 20) preferred the social robot over the Tablet PC application for monitoring and training their health. Mostly, they stated that the robot was easier to use and is a more personal technology. We think these results are highly encouraging. If, with all the flaws that were present in the currently used social robot, our study still showed signs that social robots are an engaging monitoring and training technology for a group of older adults, this acceptance can only increase once the quality of social robots for effective and efficient human-robot interaction further improves.

Limitations
The social robot used in this study was NAO, which is only one example of the wide range of social robots that are currently available. As we used only one type of robot, the identified usability issues might not be fully representative of the generic usability issues that social robots currently face. However, NAO is a popular and widely used robot and displays many characteristics that can be found among the current generation of social robots. We therefore think that our inventory gives a good indication of the current state of the art. For a full and definite picture, future research should confirm this claim and should report usability challenges that other social robots face. We focused this study on using social robots for monitoring and preventing frailty among older adults. The results should therefore be taken with caution if one wants to generalize them towards other (geriatric) conditions. For example, a social robot might not be very well suited to support older adults in performing cognitive training exercises. Finally, our study included 20 participants. Although this number is very acceptable for conducting a usability test [48], it might, from a scientific point of view, limit the potential for generalization of our results. Similarly, our local recruitment approach might have introduced some bias in our participant sample (Dutch frail or pre-frail older adults).

Concluding remarks
Social robots are becoming increasingly popular in healthcare for monitoring and health training purposes. Especially as they can be an engaging and cost-effective means for doing so. Our study showed that social robots have potential when used for monitoring and training the health of frail older adults, but that there are still some critical usability challenges that need to be overcome first. We are therefore looking forward to studies and technological innovations that tackle the critical usability issues we identified.
Funding This work is conducted within the context of the IMI SPRINTT (IMI-JU 115621) project.