1 Introduction

The overexertion of health personnel is an increasing problem for the public health care system. While 2013 the global demand-based shortage of healthcare-workers was at 6.5 million, experts expect the shortage to increase by more than twice the number by 2030 [1]. To alleviate this gap, the deployment of assistive social robots is intended, especially in elderly care [2,3,4].

These assistive social robots can be divided in companion type or service type robots [3]. Companion robots focus on providing socio-emotional support or promoting physical exercises [5, 6], e.g., the pet robot PARO [7], AIBO [8], iCAT [9], or NAO [10]. Service robots assist medical facilities with service tasks, such as CASERO [11] or Pearl [12], or with functional tasks such as ROBEAR, which lifts patients [13]. Recently developed social assistance robots such as Care-O-Bot 4 [14] and GARMI [15] combine practical skills for service operation and companionship. Supporting caregivers in facilities, e.g., by transporting beverages or performing relevant tasks in home care of elderly people, such as complex medical examinations or mobilization of patients, requires individualized treatment and personalized motion sequences [16]. These movements pose challenges to human safety [17] and therefore require a human-in-the-loop (HIL) approach, which is realized through teleoperation of the assistive robot platform. Teleoperation of the robot provides the human controlling the robot with the ability to vary the degree of robot autonomy depending on the complexity of the task and allows for appropriate response to hazardous situations. While HIL functionality can be used to individualize robotic motion, task-based robotic autonomy enables better access to health-promoting programs and increased time efficiency for patients and healthcare professionals [18,19,20]. For example, a personalized robotic motion previously learned via teleoperation can be autonomously repeated by the robot, reducing the physical and time burden on healthcare professionals. In addition, automated treatment accelerates patient recovery, with frequency and duration of treatment being a key factor [21]. Successful treatment requires human acceptance and compliance with both HIL and autonomous robotic application (see Fig. 1). This social acceptance towards the robot and close physical human-robot interactions (pHRI) depends on various human factors [22], robot features [2], and task characteristics like the degree of autonomy [23]. Therefore, the development of remotely

Fig. 1
figure 1

Overview of system components enabling HIL applications integrated in the assistive social robot platform GARMI as avatar. The study analyses variables predicitive for user acceptance of such avatars, focusing on the influence of varying level of systems’ autonomy

controlled assistive social robots requires an understanding of the influences on human acceptance of close pHRI in combination with teleoperated as well as autonomous systems. The development and deployment of a safe assistive social robot involves a large amount of funding and a long duration of the development process, as ethical, legal, and technical issues need to be resolved [24, 25]. For example, the first implementation of highly controlled user studies in a home environment with the PR2 robot required 10 years of preparation [26]. Finally, the user’s perception of the robot’s functionalities has a major impact on the willingness to use the system. Studies show that the user’s judgment of the robot’s functionalities may even make them unwilling to participate in experimental studies with a fully functional robotic system if the robot is perceived as insufficiently functional to perform its main tasks [25]. From both an economic and a social point of view, it is therefore necessary that the technical development process of such complex systems is accompanied by user evaluations even in the prototyping phase [27]. In this paper, we will demonstrate a user-centered design process at the beginning of the development process of assistive social robots. To this end, we used the recently introduced research platform GARMI and considered both its autonomous and teleoperated control modes. We used verbal introductions and video sequences as well as live demonstrations of proposed functionalities to assess users’ willingness to use ten health-related robot functionalities. We aim to identify tasks that suffer from low willingness to use and therefore require further consideration and special care in the ongoing design process of the GARMI system. In addition, we want to identify a set of user characteristics that influence the willingness to use assistive social robots for health-related (p)HRI, helping to better understand the group of future GARMI users.

This paper is structured as follows. Section 2 outlines the related work. The experiment carried out to test the hypotheses is presented in Sect. 3. Then, Sect. 4 describes the results which are discussed in Sect. 5. Finally, Sect. 6 concludes the paper.

2 Related Work

This section outlines methods in the field of acceptance research and briefly summarises several parameters, which influence user acceptance in interactions with assistive social robots. Particularly, we present pertinent literature about acceptance evaluations of assistance tasks performed with a varying level of autonomy.

2.1 Assessing User Acceptance of Assistive Social Robots

Acceptance is generally defined as the intention to use a specific technology [28]. To understand variables influencing intention to use, the Technology Acceptance Model (TAM) was developed [29]. This model highlights aspects of practicability and usability and is sufficient for acceptance assessments of classic information technologies [22]. To evaluate the user experience of social robots, hedonic factors need to be included [30]. Hedonic factors focus more on emotional aspects of user experience such as enjoyment in HRI [31] and the degree of visual attractiveness of the robot [2].

Further, social robots are mainly used of voluntary nature in domestic areas. Thus, usage decisions are influenced by the persons’ immediate environment [22]. These additional aspects to robot acceptance are considered in the Unified Theory of Acceptance and Use of Technology (UTAUT), which became a central instrument in robotic acceptance research. The model suggests four core determinants of intention to use, namely performance expectancy, effort expectancy, social influence, and facilitating conditions. Tested in organisational settings, these four predictors are able to explain up to 70% of variation in intention to use, which directly effects actual usage behaviour [32]. Based on the UTAUT model, the authors of [31] adapted the original determinants and added seven additional constructs in order to assess the acceptance of assistive social robots. This so called Almere Model addresses the user group of older adults, i.e. aged 65 years and older and the usage in private homes or care-taking facilities. For assessing the target group of older adults with heterogeneous support needs, the inclusion of the construct perceived adaptability is beneficial. The evaluation of the model shows that specifically perceived usefulness of assistance and user attitude serve as most relevant predictors for robot acceptance [31].

2.2 User Characteristics Influencing the Acceptance of Assistive Social Robots

Besides factors covered by the Almere Model, constructs like user’s attitudinal, normative, and control beliefs in HRI as well as functional characteristics of the robot itself, and users’ socio-demographical factors impact robot acceptance [22].

For example, when interacting with assistive social robots, the majority of older adults are not able to estimate the difficulty of usage, which causes anxiety already prior to the interaction. Thus, age correlates negatively with intention to use [33]. The authors of [34] found perceived enjoyment and social influence to have a negative statistically significant correlation with age. This indicates that older adults are prone to avoid using the robot as they except interactions to be less pleasurable. Older adults tend to hold more negative attitudes towards assistive social robots and do not desire to have a robot at home [35]. But the approval is increasing with rising age for robots which help to gain independence in case of human physical or cognitive impairments [36]. Nevertheless, assistive social robots for elderly care do not only interact with older adults. For instance, younger relatives, additional human care-takers, or doctors may also be involved in the ecology of elderly care-taking. Therefore, the challenge is to design a technical layout for social assistive robots that increases the perceived ease of use for both older adults and all other user groups [37].

When excluding the influence of age, women are generally more negative about robot usage in their life. Men instead are more likely to imagine personal robots to be part of their daily life, but see robots more in the role of functional tools [36]. Females focus on emotional aspects towards the robot and characteristics of provided HRIs [2]. A systematic literature review including participants of all ages shows that women have more trust when interacting with social robots, but no difference for gender was found for the variables affective attitudes towards social robots, general attitudes, acceptance, or anxiety towards social robots [38]. Also the authors of [34] detected no significant difference between genders in the ratings of Almere Model constructs.

However, previous studies found correlations between gender [33] as well as age [39] with technology or robot experience, indicating that differences in acceptance originate rather from moderating effects of experience than from the actual variables age and gender per se. Thus, the interactions between sociodemographic factors should be taken into account when analysing acceptance of assistive social robots. Among these variables, experience has the highest influence on acceptance [4]. However, the description of user’s prior experience in one value alone fails to capture the variable in its full complexity. Therefore, the inclusion of additional factors related to experience may help to better predict influences on user acceptance. Previous studies show that a direct experience of usefulness is one of the main persuasive factors to change users’ willingness to use assistive technologies [40]. Since an actual user experience is difficult to realize in early stages of development, a live demonstration of the robot’s capabilities with the user as a spectator can increase the user’s involvement in the robot’s introduction and influence the way users build their perception of the robot [41]. Further, previous experience with pets show to increase the minimum distance that a human perceives as comfortable for an approaching robot [42], which is a method to assess users’ perceived safety in pHRIs [43]. Thus, pet ownership may increase the familiarization in interactions with non-human agents. As assistive social robots are developed to support tasks which are currently executed by health professionals, people with a profession in a medical or care related field may be biased due to their higher level of experience in executing such tasks. Acceptance research is available assessing the preference of robotic or human assistance for a variety of tasks when robots have the role of a coworker [44], but further research is required to understand their perception of assistive social robots for their private use.

2.3 Teleoperation in HRI

Teleoperation is defined as controlling a system over a distance. This technology was first implemented in the 1950s and was usually applied in robots which facilitate locomotion and manipulation in inaccessible or hazardous environments, e.g. in space, military or deep-sea applications [45]. In the last decades, however, robots increasingly found their way to home environments to, e.g. assist older adults or people with impaired abilities. These robots interact closely with humans, leading to scenarios with potential danger to humans. Consequently, to avoid hazards towards the human on the one hand but ensure efficient care-taking on the other hand, autonomous or teleoperated execution of a task may vary. Therefore, the definition of teleoperation in HRI refers to the level of autonomy as well as the spatial separation between the robot and the person controlling it [46]. To determine a robot’s level of autonomy, its position in the continuum spread between the two poles teleoperation and full autonomy is assessed. The percentage of duration that a robot is carrying out a task on its own (i.e. autonomy) and the duration that an operator is controlling the robot (i.e. intervention) are used to describe the level of autonomy [47], e.g., a teleoperatively controlled task is composed of 100% intervention and 0% autonomy. But to compare user perceptions of different levels of autonomy in acceptance studies, a classification in fixed categories is beneficial. The authors of [46] suggest a classification of ten different levels of robot autonomy (LORA). The decision which LORA is required for a specific task depends on the criticality of errors, the complexity of the environment and the accountability of errors. The latter refers to the fact that in the event of an error, e.g., in health-related tasks, the human operator should continue to feel responsible so that countermeasures are taken quickly. The evaluation of these task and environment variables are performed for the sub-components sensing, planning, and acting of a task. Depending on the assignment of the sub-component to the human and/or the robot, the required level of autonomy can be selected [46]. If one carries out the classification according to the presented pattern for tasks within the framework of assistive social robots, it is obvious that due to the complex environment and the requirement of high motor and cognitive skills, a possibility for teleoperation is inevitable for such robots [16]. Even though it is assumed, that the level of robot autonomy of a task affects the acceptance of assistive social robots, further research to identify specific influences is required [46]. The authors of [23] show that users’ attitude towards a teleoperated robot in a work-related context is significantly higher than towards a fully autonomous robot. This accounts for situations when the robot is perceived as technical equipment. When perceived as a coworker, users’ attitude towards the robot is generally lower and the same decreasing trend is apparent for fully autonomous robots, although without showing any significance.

2.4 Definition and Acceptance of Telemedincine Systems

Assistive social robots are developed to allow older adults to live independently in their own homes. As the mobility required for doctor visits can be impeded, assistive social robots can facilitate remote medical care. This remote delivery of medical services between a doctor and patient is called telemedicine [48]. Currently, common telemedicine systems on the market are restricted to the exchange of audio-visual information without the implementation in robots. Via mobile applications installed on electrical devices, several telemedical services are possible, e.g. exchange of echocardiogram [49], teleconsultation [50], or the provision of health promoting programs [51]. These telemedicine systems without an involvement of a robot achieve good results of health improvements [52,53,54] and enjoy a high level of acceptance among doctors [55,56,57] and patients [58, 59]. To offer health professionals the possibility to navigate or manipulate objects remotely in home environments, the patient site of telemedicine systems need to be realized with robots. Such robots can be restricted to enable only audio-visual communication between health professionals and patients. These robots like TRIC [60], CompanionAble [61], VGo [62] or Giraff [63] are developed to improve the patient’s psychological well-being and social participation. To collect vital signs of the patient, the robot needs to be equipped with dedicated measurement devices. These have to be either operated by the patients while being mounted on the robot trunk like implemented as with Peoplebot [64], Medical Tele-diagnosis Robot (MTR) [65] and HealthBots [25] or they can be remotely controlled by the doctor via a manipulator, e.g. realized with ReMeDi. The latter offers the performance of professionaly performed health-related pHRIs like a teleoperated auscultation or ultrasound diagnostic [66].

Acceptance research for such telemedicine robots performing (p)HRIs is limited due to the current insufficient level of development [16]. However, one study was conducted with the Peoplebot showing that users’ perceived quality of interaction with the robot can be significantly predicted by positive emotions and positive attitude towards the robot prior to the interaction, whereas age, sex and computer experience have no significant influence [64]. The same results were observed in a user study conducted with the HealthBots. Users’ sociodemographic factors like age, gender and computer knowledge showed no significant correlation with the overall rating of the robot interaction or the intention to use the robot again after a 2-week test trial. Studies with the ReMeDi robot are limited yet on the usability of its remotely controlled navigation [67] or its graphical user interface (GUI) for teleconsultations [68]. But the assessment of users’ and doctors’ perceptions on remotely controlled or even autonomously conducted pHRIs in a medical setting like auscultation or ultrasound diagnostics would expand the current state of knowledge.

2.5 Definition and Acceptance of Telerehabilitation Devices

Innovative teleoperative assistive social robots also enable remote occupational or physical therapy as well as cardiac or vocational rehabilitation which is then called telerehabilitation [69]. Telerehabilitation robots can be divided in unilateral and bilateral systems [48]. Unilateral systems require a robot only on the patient site, which passively moves the patient’s extremities following a pre-programmed pattern. Force-feedback in active movements of the patient is provided by the robot, whereas the therapist is only able to instruct the patient audio-visually but not actively intervene in patient’s motion execution [70, 71]. In bilateral setups, respective robots on each site enable the therapist and patient to interact visually and kinesthetically with each other [48]. Thus, also diagnostical examinations like joint assessments of the range of motion or “end-feel” are feasible. The authors of [72] provide an overview of current rehabilitation robots. Whereas there are some unilateral systems available, e.g. MIME [73], ARMin [74], T-WREX [75], the only bilateral system listed in the review is MIT-MANUS [76]. These telerehabilitation robots are all realized as stand-alone systems, but integration of such functions into assistive social robots would expand their healthcare capabilities. Such an integration is realized for the robot GARMI [77]. However, the development process of telerehabilitation robots still lacks studies in the field of acceptance research.

Fig. 2
figure 2

Telemedicine and Telerehabilitation scenarios of GARMI with dedicated manipulator. All relevant service requests, audio-visual information, motion, and tacile parameters for the teleoperative application are communicated between patient site and doctor site via cloud service. The autonomous mode

2.6 Acceptance of Support in Activities of Daily Living

The term telecare is not clearly defined, as the tasks it includes are very broad. Generally, it covers the delivery of support provided by professionals to individuals with the aim to offer services that complements existing models of care [78]. The authors of [79] confine telecare more to services within the framework of health and social care, directly delivered to the user in their home environments and applied by information and communication technology. In this section, we would like to refer telecare more to the support in activities of daily living, with focus on the tasks of shaving and medication intake. Activities of daily living can be categorized in three different groups. Basic activities of daily living (ADLs) refer to basic needs which are required to maintain one’s well-being like eating, bathing and toileting. Instrumental activities of daily living (IADLs) ensures the individual to live independently, e.g. shopping, preparing food, housekeeping and managing of finances. The last category, the so called enhanced activities of daily living (EADLs), includes activities to enable individuals to participate in social communities and to engage in hobbies [80]. In telecare scenarios where participants can decide whether they prefer human or robotic assistance, only 12% favour robotic assistance for ADLs, 50% for IADLs, and 34% for EADLs. Thus, the willingness to use telecare functionalities is highly task-specific. In the group of ADLs, shaving is the task where participants show the lowest acceptance of robotic assistance [81]. The authors of [82] provide insight into the user’s perception of using a semi-autonomous shaving function implemented in the PR2 robot. The well-trained user succeeded in shaving his face within 54 min. He rated his experience on a 7-point Likert scale which indicated that he felt safe during the whole experiment and that he perceived the robot as enjoyable to use. Even though shaving requires more time with the developed system, he would prefer to use the robot instead of asking a caregiver to help [82]. Although this result is motivating, further research with a higher amount of participants is needed to substantiate this finding for a more general population. In the group of IADLs, the task of medication management is highly discussed to be supported by telecare roots. Here, the preference of human or robotic assistance varies among the different parts of the task. While remembering to take medication is preferred by robot assistance, deciding which pills to take is preferred by human support [81]. The preference for robotic assistance of a medication reminder function increases even more after users had the opportunity to experience a robot performing this task. Instead, even after a live demonstration, participants still prefer human help in deciding on the right medication. According to the users’ statements in a follow-up interview, they highly questioning the cognitive abilities of the robot to distinguish between different kinds of medicines and to reliably recognize the recipient [83]. Thus, decreasing the level of autonomy for this part of the task to include a human-in-the-loop can be assumed to increase user acceptance.

3 Experiments

In this section, we explain the underlying hypotheses for assessing the acceptance of (teleoperated) (p)HRIs in an elderly care application field as well as the conducted experimental procedure.

3.1 Assistive Service-Humanoid GARMI

For this study, we applied the humanoid and teleoperative research platform GARMI [15]. This robotic platform closes the gap between companion robots and service robots by offering multimodal HRI, support in activities of daily living, and service tasks. Besides operating fully autonomously, GARMI can also be externally controlled in avatar mode. This mode allows medical experts to teleoperate GARMI, e.g., for diagnostic analysis such as auscultation and heart rate measurement. For all teleoperated applications, robotic systems are required on patient and doctor site. Whereas GARMI represents the patient site, the doctor site is equipped with a seven DoF single-arm manipulator [84]. Depending on the desired application, the required manipulator functionality varies. Thus, two setups exist on the doctor side which are depicted by Fig. 2. For medical examinations of vital signs, the manipulator is used as a haptic device to control GARMI’s motion remotely. It is mounted vertically to the doctor‘s desk and offers a handle to control GARMI‘s arm movements [85]. For telerehabilitation applications, the manipulator mimics the human arm physiology. It is, therefore, mounted horizontally. With this experimental setup, the doctor performs passive mobilisation or remote physical examinations, e.g., muscle function tests, directly on the robot manipulator as if it is the human arm. The forces felt by the robot manipulator are then replicated by GARMI on the patient’s arm.

3.2 Hypotheses

This study is motivated by the main research question:

“Why do people accept (or not accept) innovative functionalities of an assistive social robot?” Robots are considered accepted when they are willingly integrated in the user’s daily routines [2]. But examining such measure in an early state of robot development is cumbersome and sometimes not even possible due to structural or legal reasons [86]. Use intention indicates the strength of the human’s willingness to use a robot and leads directly to actual behavior [87]. Therefore, intention/willingness to use can be utilized as a dependent variable in studying user acceptance of assistive social robots [22]. The intention to use an assistive robotic device is bound to a certain application or robot functionality [2]. Therefore, we aim to identify users’ intention to use an assistive social robot based on specific functions required in medical assessments, rehabilitation, and support in daily living tasks. Table 3 lists the ten selected functions. Based on the literature research and the theoretical overview above (see Sect. 2.2), following independent variables, displayed in Table 1, are included to predict the intention to use selected functions offered by the robot. These analyses are gathered in the following hypotheses:

\({\textbf {H1}}_{{\textbf {a}}-{\textbf {j}}}\) The intention to use specific robot functionalities a–j (see Table 3) can be explained by (1) anxiety, (2) attitude, (3) perceived ease of use, (4) perceived usefulness, (5) trust, (6) age, (7) gender, (8) form of robot introduction, (9) profession, (10) years of pet ownership and (11) experience with robots.

Table 1 Independent variables of Hypotheses a–j
Table 2 Second part of Questionnaire: adapted Almere Questionnaire

To analyse the difference in acceptance of tasks with and without HIL and the method of introduction of the robot, following hypotheses are tested:

H2 The acceptance of GARMI will be higher for HIL application than for a fully autonomous execution in auscultation, mobilisation and medicine intake.

H3 Participants show higher acceptance towards GARMI when they got a live introduction of the robot.

3.3 Instruments

In this study, we evaluated the users’ acceptance and personal traits using a questionnaire which contained three parts. The first part asked for relevant sociodemographic data as well as parameters related to the person’s experience with robots. From these questions, the variables Age, Gender, Experience, Profession and Pets were defined. Experience indicates the duration of years a participant dealt with any kind of robotic system. Profession is divided in two categories, namely health-related profession and non-health-related/other. Pets indicates the duration in years a participant owned a pet, divided by his/her age (see Table 1).

Table 3 Third part of Questionnaire: GARMI functions

The second part of the questionnaire assessed the acceptance of GARMI using the Almere questionnaire. The 41 questionnaire items assigned to the eight specific constructs are depicted in Table 2. All participant’s mother tongue was German. Thus, the original items of Almere questionnaire were translated to german following the team approach. Hereby, the items were first translated independently by two different translators. These two translations were then discussed in a joint session, led by a moderator. The translated items as well as remaining ambiguities were discussed with an adjudicator and tested within a small sample group to check if all items were understood correctly [88]. The original Almere questionnaire was adapted to fit our use case. Therefore, the constructs

  • Perceived Sociability” (PS),

  • Social Influence” (SI),

  • Intention to Use” (ITU) as well as

  • two items of “Perceived Enjoyment” (PENJ)

were excluded from our questionnaire. The remaining constructs

  • Anxiety” (ANX),

  • Attitude” (ATT),

  • Facilitating Conditions” (FC),

  • Perceived Adaptability” (PAD),

  • Perceived Ease of Use” (PEOU),

  • Perceived Usability” (PU) and

  • Trust” (TRUST)

were rated based on a five point Likert-type scale ranging from one to five where:

  1. 1.

    - Totally disagree,

  2. 2.

    - Disagree,

  3. 3.

    - Don’t know,

  4. 4.

    - Agree, and

  5. 5.

    - Totally agree.

The items occurred in randomised order. The third part of the questionnaire (see Table 3) briefly presented ten selected functions of GARMI and asked the participants to rate whether they trust GARMI enough to let it help them on a semantic differential scale between scores one to five with the two poles:

  1. 1

    - “I would never do it”, and

  2. 5

    - “I would immediately do it”.

3.4 Procedure

The experiments were conducted at two different locations according to the specific applied method of introduction of the robot GARMI. An oral introduction was given to the participants in German Museum (Deutsches Museum) located in Munich, Germany. The second location was the so called “robot experience center” of Munich Institute of Robotics and Machine Intelligence (MIRMI) in Garmisch-Partenkirchen, Germany where the general public can inform themselves about assistive robotic systems for applications in health care and daily living. Participants at this site received an additional live demonstration after the oral introduction. Participants took part in only one of the two sites following a between-subject design for the method of introduction. Apart from the different method of introduction the same experiment procedure was carried out at both locations which is described in more detail in the following Sect. 3.4. The experiments took place from September to November 2021 on both locations. The participants were first informed about the study procedure and processing of their personal data. After the participants declared their consent to attend the study, they were given a standardized 10-min oral introduction informing about GARMI based on a power point presentation. The presentation included images that provide information about the size and visual appearance of GARMI to help participants form opinions about GARMI. A short overview was given about the capabilities of GARMI’s head, torso, mobile platform and arms. Further, possible applications of GARMI within the framework of elderly care were described and the possible support by HIL was explained. To ensure that the participants understood the concept of HIL, two short videos of a teleoperated auscultation and mobilisation were shown. Thus, participants recruited at the German Museum formed their opinion about GARMI based on oral information, pictures, and videos. At the second experimental site participants received the same introductory presentation to GARMI. Additionally after this oral introduction, they were presented a live demonstration of a teleoperated mobilisation executed via GARMI. The participants had the role of spectators in groups of five to ten. After the respective introduction with or without live demonstration, the participants were asked to fill the aforementioned three-part questionnaire (see Fig. 3).

Fig. 3
figure 3

Description of the conducted procedure at the specific experiment locations

3.5 Subjects

A total of 166 subjects participated in our experiment. On both sites, the participants were volunteers recruited from visitor groups coming to MIRMI or to German Museum. At German Museum an installed information booth was used to attract visitor’s attention. Here, participants were only approached after they showed interest of their own will in our information booth. Visitors of the robot experience center of MIRMI were invited to participate in the study in the beginning of their visit. As a visit of MIRMI requires a self-initiated registration, an approximately equal interest in the topic of assistive robotics can be assumed to avoid a self-selection bias of certain participant characteristics as best as possible. The age of subjects ranged from 18 to 83 years old with an average of 42.7 ± 19.9 years. The group consisted of 95 female and 71 male subjects.

3.6 Data Processing

The scores of the single Likert-type items of the Almere questionnaire were averaged for each construct. For most constructs, a higher score represents a higher level of agreement. Items of the ANX construct are negatively formulated, which results in lower scores for a higher degree of anxiety. The calculation and application of mean values as measure of central tendency requires the assumption of equal intervals between scale points. This assumption is much discussed for Likert-type scales [89, 90] and semantic differential scales [91]. By averaging more Likert-type items, the resulting composite score is called Likert scale data which is suggested to be treated as interval scaled data regarding the level of measurement [90]. Using the mean and standard deviation is moreover recommended by the Almere questionnaire originator [86] and applied in various publications [92, 93]. Therefore, the Almere constructs are described by mean and standard deviation. Also the willingness to use GARMI functionalities assessed with the semantic differential scale was analysed on the interval measurement level because, according to [94], the occurring deviation from a metric scale level is of negligible size.

3.7 Statistical Analyses

To verify that the translated items still load on the original Almere Model constructs, a factor analysis with rotation component matrix was conducted. As the affiliation of specific items requires a reasonable correlation, the correlation matrix of all items was visually checked to be above a minimum value of 0.3. However, no item correlation is supposed to be higher than 0.9 to preclude multicollinarity [95]. Next, with the Bartlett’s test the null hypothesis was tested whether the correlation matrix approximates an identity matrix. This would indicate that the items are random and not suited for clustering [96]. Here, the Bartlett’s test was significant (\(\chi ^2\) (276) = 1618.81, \(p < 0.05\)) indicating adequate factorability. In the next step, the distribution of shared variance among the items was checked by means of the Kaiser-Meyer-Olkin measure (KMO). This is calculated as the sum of squared correlations compared to the sum of squared correlations plus the squared partial correlations [97]. With a value of 0.809 the data had a high sampling adequacy to conduct a factor analysis. Summarizing, all items were suitable for factor analysis [98].

Hypothesis 1 was tested using a multiple linear regression. It analyses which variables predict the willingness to use several functionalities offered by GARMI. The dichotomous variables were included as dummy-coded variables. As a correlation among the independent variables was expected, all ten independent variables were included simultaneously in the model to avoid any effects of the order of variable inclusion and then the method of backward elimination was applied. Hereby, the least significant variable is removed in every step until the stopping rule of \(F < .1\) is reached or no variable is left in the model. Due to a violation of the prerequisite of normally distributed residuals, backward elimination was only used as a first step to select the variables included in the model with the highest adjusted \(\textrm{R}^{2}\). Adjusted \(\textrm{R}^{2}\) indicates the predictive strength of the resulting model, showing to which extend the variation of the dependent variable can be explained by included independent variables [99]. For significance tests in the second step, multiple linear regression with bootstrapping of 4000 samples and Bias-corrected and accelerated (BCa) 75% confidence interval was applied [100]. The predictors were added all at once, i.e. using all-in selection method.

In order to test assumption of hypothesis H2 that the acceptance differs between tasks with a human-in-the-loop and tasks fully autonomously controlled, a Wilcoxon signed-rank test was used for the pairs MOBIA and MOBITELE, STETHA and STETHTELE as well as MEDIUN and MEDIRE, respectively. Thus, a within-subject design was applied. Due to the violation of the assumption of normal distribution, a non-parametric analysis was used.

Hypothesis H3, considered the effect of two different introduction methods on acceptance variables using a Mann-Whitney U-Test as an non-parametric analysis due to not normally distributed groups. As the participants are assigned to only one introduction method, a between-subject design was applied.

4 Results

In this section the results of the conducted study are described, beginning with the tests for model expandability and construct reliability, and followed by descriptive statistics of acceptance of GARMI. Further, the multiple regression analyses show which variables influence the acceptance of several functionalities offered by GARMI. Concluding, the effect of HIL application and method of robot introduction are described.

4.1 Model Expandability and Reliability

As the first step of the factor analysis, only factors with an eigenvalue \(\ge \ \)1 were chosen, according to the Kaiser-Guttman-criterion. The eigenvalue indicates the amount to which a specific factor explains the variance of all questionnaire items [101]. In this study, seven factors had Eigenvalues \(\ge \ \)1 accounting together for 66.23% of total variance. Next, the result of the varimax-rotated loading matrix shows which questionnaire items load on which respective factor. Our results indicated that all items of the constructs PU, ATT, and PAD loaded on the same factor, indicating a similar meanings of these constructs. All items of ANX, PEOU, TRUST, and PENJ loaded on their own factor, respectively. Items FC9 and PAD15 loaded on two different factors quite equally and item ATT6 as well as PEOU24 loaded solely on one factor. PEOU25 fell below the cut-off-value for factor loading coefficients of 0.4 and was excluded in further analyses. Because items PEOU24 and ATT6 couldn’t be assigned to any factor, they were also excluded in further analyses. In summary, the items still loaded on the original Almere constructs. Only the questionnaire items ATT6, PEOU24 and PEOU25 were excluded due to poor allocability.

To assess internal consistency reliability, Cronbach’s \(\alpha \) was calculated for the remaining items of each construct. For constructs consisting of only two items, Spearman–Brown coefficient is rated as the most appropriate indicator following the same convention [102]. A minimum score of \(\alpha = 0.7\) indicates an acceptable reliability [103]. As depicted in Table 4, all constructs except PENJ, FC, and PAD could be used to assess the acceptance of assistive social robots in population aged 18 years and older.

Table 4 Internal consistency scores of Almere Model constructs

A detailed analysis of Cronbach‘s \(\alpha \) of the sample divided by the age of 65 years showed that PAD achieved acceptable reliability in older adults (\(\alpha \) = 0.825), but not in younger age group (\(\alpha \) = 0.576). This confirms that adaptability can be used in original Almere questionnaire target group, but is less relevant for younger participants [86]. The same accounts for construct PENJ. Whereas participants above 65 years showed acceptable reliability score (Spearman–Brown-coefficient = 0.809), PENJ construct can’t be applied for younger subjects (Spearman–Brown-coefficient = 0.494). FC achieved no acceptable reliability in both age groups (Spearman–Brown-coefficient = 0.480 for 65+ years, Spearman–Brown-coefficient = 0.255 for 18–64 years). As FC measures objective factors in the environment facilitating the usage of the robot, this construct should be used for experiments in clear usage setting such as care facilities or the own home of participants. Summarizing, constructs PENJ, FC, and PAD were not considered for further analyses.

4.2 Acceptance of GARMI

Figure 4 shows the scores of Almere constructs. All constructs except PEOU were rated on the whole range of possible scores from one to five resulting in a generally high variance among all constructs. ANX received the lowest mean value indicating a high degree of anxiety towards GARMI and highest variance measured among all constructs (ANX = 2.68 ± 1.17). An analysis of the single ANX items showed that this results mostly from item ANX3. With a mean value of 1.77 ± 1.10, the majority of participants rated GARMI as scary. Concrete factors, effecting this scary image like GARMI’s size, level of anthropomorphism, or motion design need to be further evaluated. ATT-score showed that participants’ general feeling about the appliance of GARMI is positive (ATT = 3.88 ± 0.81). Participants also felt capable to learn using GARMI effortlessly, which is indicated by a PEOU-score of 3.89. With a standard deviation of 0.77, PEOU-score presented the smallest variance (PEOU = 3.89 ± 0.77). PU-score displayed that subjects considered GARMI as assistive for themselves. Here, the high variance may be an indication of the wide age range with different needs (PU = 3.48 ± 1.01). Subjects were neutral in their attitude to trust GARMI’s abilities (TRUST = 3.27 ± 0.81).

Fig. 4
figure 4

Results of Almere Questionnaire. A higher score represents a higher level of agreement to the specific items assessing a construct. For ANX, lower scores represent a higher degree of anxiety

Figure 5 displays the willingness to use several functionalities offered by GARMI. Generally, the willingness to accept GARMI’s help was rated higher than the Almere construct scores, but was also accompanied by higher variances. For all functionalities, the whole range of possible scores from one to five occurred. Results showed that the majority of participants would call GARMI over to get the robot’s attention for starting an order (CALL = 4.73 ± 0.62). Participants also showed high approval to be touched by GARMI on the forearm (TOUCH = 4.26 ± 1.01) and on the wrist for measuring the heart rate (HR = 4.50 ± 0.93). This approval decreased strongly for physical contact in sensitive areas like the face, e.g. let GARMI shave one’s face (SHAVE = 2.43 ± 1.34). With a mean score of 4.23 ± 1.10 for automated and 4.22 ± 1.10 for a teleoperated execution, upper limb mobilisation was accepted by users (MOBIA = 4.23 ± 1.10; MOBITELE = 4.22 ± 1.10). Other medical examinations like auscultation with a stethoscope on one’s chest achieved similar approval. The comparison of mean scores and standard deviation for automated (STETHA = 4.34 ± 1.03) and for teleoperated (STETHTELE = 4.43 ± 0.89) auscultation highlights that users slightly preferred an execution with a human-in-the-loop. In autonomous tasks which may endanger one’s well-being, participants preferred an opportunity for human-conducted verification. Participants would take their daily medicines delivered by GARMI if an opportunity for verification is given (MEDIRE = 4.31 ± 1.05). If no verification is possible, mean score decreased strongly (MEDIUN = 2.81 ± 1.44).

Fig. 5
figure 5

Mean and standard deviation of willingness to use several functionalities offered by GARMI. Lower scores represent lower willingness to use, higher scores represent higher willingness to use GARMI for the considered tasks

4.3 Predictive Variables of Acceptance

Following, the results of hypothesis 1 are shown for each functionality, respectively. The result tables of the conducted multiple linear regression show which set of independent variables, i.e. called model, are selected via the backward elimination method suitable to best explain the particular dependent variable. The adjusted \(\textrm{R}^{2}\) indicates the percentage of the variance of the dependent variable which can be explained by the model, thus, the goodness-of-fit of the model. According to [104] an adjusted \(\textrm{R}^{2}\) of .02 has a weak goodness-of-fit, a value of .13 indicates a moderate goodness-of-fit and a value of .26 and above a strong goodness-of-fit. The other colums in the result table refer to the single variables of the model. The upper value in the column Beta is the regression coefficient. It shows how many score points the intention to use increases/decreases by changing the independent variable by one unit. The analysis of dummy-coded variables depends on the determined reference category. These are set to female gender, non-health-related profession and oral introduction, respectively. Thus, a positive Beta indicates that participants with a male gender, a health-related profession as well as which got a live demonstration as introduction show higher intention to use the particular functionality. The lower value in the column Beta shows the coefficient interval. If the interval excludes the value 0, the specific independent variable is significantly different from 0 and thus, has a significant influence on the prediction of the dependent variable. As the calculation of the coefficient interval (and standard error SE) is based on the robust method of bootstrapping, it works as the key criterion for significance decisions in multiple regression analysis if the assumption of normally distributed data can’t be met. The p-value is displayed for reasons of comparability to an analysis without bootstrapping, but is not the decisive criterion in the decision of significance.

For hypothesis H1a, variables ATT, PEOU, PU, TRUST, Age, Intro, and Profession are able to explain 20% of the variance in the willingness to use function CALL, which indicates a moderate goodness-of-fit according to guidelines of [104] (CALL \(\textrm{R}^{2}\) = 0.208). As visualized in Table 5, TRUST and PEOU are statistically significant predictors. Both BCa confidence intervals exclude the value 0, indicating a robust result. The regression coefficient of TRUST shows that an increase of the Likert-Scale by one point measuring TRUST, the intention to use the functionality CALL increases TRUST by 0.117 score points. The same effects appears for PEOU, with an increase by 0.130 score points.

Table 5 Multiple linear regression analysis of Hypothesis H1a (CALL)

A model consisting of the variables ATT, PEOU, PU, TRUST, Age, and Pets is able to explain 23% of the variance in the willingness to use function TOUCH, achieving a high goodness-of-fit (TOUCH \(\textrm{R}^{2}\) = 0.237). The variables PEOU, PU, and TRUST have a positive statistically significant effect on the willingness to use function TOUCH. Thus, to accept pHRI, users need to trust the robot as well as consider the robot as useful and easy to use (see Table 6).

Table 6 Multiple linear regression analysis of Hypothesis H1b (TOUCH)

As listed in Table 7, the analysis of hypothesis H1c shows that 15% of the variance in the willingness to use function HR can be explained by the variables PEOU, PU, TRUST, Intro, Profession, and Pets. For pHRI with harmless tools for medical examinations, a higher willingness to use is significantly apparent for users with longer pet ownership and which got a live introduction of GARMI. Furthermore, users’ TRUST and PU of the robot contribute positively to accept this functionality.

Table 7 Multiple linear regression analysis of Hypothesis H1c (HR)

The test of hypothesis H1d, displayed in Table 8 reveals that out of the model’s variables ANX, PEOU, PU, TRUST, Age, Gender, Intro, and Profession, the variables Gender and Profession show a statistically significant effect in the prediction of the willingness to use GARMI’s function of shaving one’s face. A male gender and a health profession lead to a higher acceptance of this functionality. The model is able to explain 14% of variance (SHAVE \(\textrm{R}^{2}\) = 0.147).

Table 8 Multiple linear regression analysis of Hypothesis H1d (SHAVE)

Participant’s willingness to use function MOBIA can be predicted by variables ATT, PEOU, PU, TRUST, Gender, and Intro with moderate goodness-of-fit (MOBIA \(\textrm{R}^{2} = 0.145)\). A higher manifestation of variables PU and TRUST increase the willingness to use the functionality of an autonomous mobilisation significantly. Further a live demonstration for introducing the robot’s capabilities has a positive statistically significant influence (see Table 9).

Table 9 Multiple linear regression analysis of Hypothesis H1e (MOBIA)

For a teleoperated mobilisation, tested in hypothesis H1f (see Table 10), a model including variables PEOU, PU, TRUST, Intro, and Pets can explain 18% of variation (MOBITELE \(\textrm{R}^{2}\) = 0.185). With a beta coefficient of \(\beta = 0.349\) and \(\beta = 0.184\), respectively, PU and TRUST have a positive significant influence on the dependent variable.

Table 10 Multiple linear regression analysis of Hypothesis H1f (MOBITELE)

Table 11 shows the willingness to use GARMI’s functionality of an autonomously executed auscultation, which can be predicted by a model including the variables PEOU, PU, and TRUST, and Gender. These variables explain 17% of variation in the autonomous execution (STETHA \(\textrm{R}^{2}\) = 0.170). Of these variables, PU and TRUST are statistically significant predictors.

Variables PEOU, PU, TRUST, Profession, and Pets are able to predict the willingness to use a teleoperated auscultation in hypothesis H1h. Together, 19% of variation can be explained with this model. Here, PU and TRUST are able to significantly predict the willingness to use. Results show that a higher manifestation of PU and TRUST have a positive effect on the willingness to use a teleoperated auscultation (see Table 12).

The test of hypothesis H1i in Table 13 shows that variables ATT, PEOU, Trust, Gender, and Profession predict 12% of variation of the willingness to use GARMI’s function to take medicine offered in unrecognizable wrapping (MEDIUN \(\textrm{R}^{2}\) = 0.129). Men as well as participants with a medical profession are more willing to use this functionality of GARMI significally.

Variables PEOU, PU, TRUST, Age, Intro, and Profession are able to predict the willingness to use recognizable medication management in hypothesis H1j. Together, 21% of variation can be explained with this model. Here, PU, TRUST, Age, and Intro are able to significantly predict the willingness to use (see Table 14).

Table 11 Multiple linear regression analysis of Hypothesis H1g (STETHA)
Table 12 Multiple linear regression analysis of Hypothesis H1h (STETHTELE)

4.4 Influence of Human-in-the-Loop

Hypothesis H2 tested if the acceptance differs between a teleoperated or fully autonomous execution of three exemplary GARMI functions mobilisation, auscultation with a stethoscope and medicine intake. The mean score of willingness to use mobilisation exercises conducted autonomously by GARMI was with 4.22 ± 1.10 comparable to the teleoperated execution with 4.23 ± 1.10. No significant difference was found (\(z = -0.434\), \(p = 0.664\)). There was also no significant difference between STETHA (4.34 ± 1.03) and STETHTELE (4.43 ± 0.89) (\(z = 1.933\), \(p = 0.053\)). Results show that users highly appreciated the possibility to check the correctness of daily medicines provided by GARMI. Users rated their willingness to use GARMI’s medicine delivery function for recognizable medicines with 4.31 ± 1.10 significantly higher than for unrecognizable medicines (2.81 ± 1.44) (\(z = -9.092\), \(p = < .001\)).

4.5 Effect of Introduction Method

For hypothesis H3, a group of 79 participants only got a 10 min oral introduction of GARMI’s application and functionalities based on pictures and short videos, whereas 87 subjects had the opportunity to additionally observe GARMI perform a teleoperated mobilisation in real-life. The only variable which showed a statistically significant difference is ANX (\(U = 1520.00, Z = -6.212, p < 0.001\)). Participants which only received an introduction based on pictures showed a mean ANX-score of 2.07 ± 0.78 and thus, were more afraid of GARMI than participants which got the opportunity to see GARMI live with 3.23 ± 1.20.

4.6 Summary of Main Findings

Following, the main results of the conducted study are described. Results show that the Almere questionnaire constructs ANX, ATT, PEOU, PU, and TRUST can be extended to a broader age range, whereas the constructs PAD and PENJ are limited for the original age group of adults older than 65 years. The missing reliability of construct FC in either age group point out that construct FC is not suitable for acceptance evaluation without an actual interaction with the robot since the rather general asked questions need a specific usage context. Therefore, further analyses in this study are limited to the Almere constructs ANX, ATT, PEOU, PU, and TRUST.

Table 13 Multiple linear regression analysis of Hypothesis H1i (MEDIUN)
Table 14 Multiple linear regression analysis of Hypothesis H1j (MEDIRE)

The aforementioned Almere questionnaire constructs indicate a neutral to positive intention to use GARMI. Of these constructs, Anxiety got the lowest value (ANX = 2.68 ± 1.17). This improvable score can be relied on the extremly low rating of item ANX3 (i.e. “I find the robot scary”; ANX3 = 1.77 ± 1.10).

Within the offered functionalities, participants are more willing to use GARMI in HRIs without physical contact (e.g. CALL = 4.73 ± 0.62) than in pHRIs (e.g. TOUCH = 4.26 ± 1.01). The willingness to use GARMI is drastically decreasing for pHRIs including contact in sensitive areas like the face (e.g. SHAVE = 2.43 ± 1.34). Contrary to expectations, there is no difference between an autonomous and teleoperated performance of telemedicine and telerehabilitation measures with physical contact. A significant difference appears for the willingness to take daily medicines offered by GARMI. Participants are highly willing to take recognizable medicine delivered by GARMI, but are neutral to slightly negative about the intake of unrecognizable medicine. Thus, users prefer a human-in-the-loop to verify health-critical tasks. All regression models are able to significantly predict users’ willingness to use each specific functionality of GARMI while achieving a medium to high goodness-of-fit. The Almere Model constructs PU and TRUST are significant predictors for a majority of offered functionalities. Like the variables PEOU and Pets, the higher the participants’ rating of these constructs, the more they are willing to use offered functionalities. Also a male gender and a live introduction show positive influence on acceptance, whereas a higher age effects lower intention to use robotic functionalities. A health profession increases the willingness to use rather safety critical tasks like SHAVE and MEDIUN.

5 Discussion

In this paper, we observed high acceptance towards telecare, telemedicine, and telerehabilitation functionalities offered by an assistive social robots among users of all ages. All constructs of the Almere questionnaire received neutral to positive values. However, the score on users’ perceived Anxiety of GARMI had the lowest values (ANX = 2.68 ± 1.17), indicating a high level of anxiety. The increased anxiety of GARMI may be attributed to its human-like size, as older adults in particular seem to prefer smaller robots [105]. In comparison, the Anxiety scores of the small conversational robot iCat (ANX = 4.23 ± 0.73) [31] and the social-assistive robot NAO (ANX = 4.89 ± 0.07) [92] are higher and show less anxiety. Qualitative studies, such as follow-up surveys accompanying quantitative acceptance ratings, are required to understand the effects of GARMI’s visual appearance on its acceptance. Further, several multiple linear regression were conducted for each robot functionality to identify factors influencing the user acceptance of GARMI. The results are summarized in Table 15. Besides the already acknowledged variables like Perceived Usefulness (PU), Perceived Ease of Use (PEOU), and TRUST (TRUST) [86], also the method of robot introduction (Intro), the field of profession (Profession), or the duration of pet ownership (Pets) are significant predictors for the intention to use a specific robot functionality. Contrary to the original Almere Model, TRUST has a direct influence on the intention to use of nearly all robot functionalities offered by GARMI. As the Almere Model is based on studies including contactless HRIs with small pet robots or screen agents, not posing any danger for the users, it can be assumed that the influence of robot reliability to user acceptance increases for pHRIs with humanoids. Interestingly, this study observes comparable user acceptance of both, a teleoperated and autonomous task execution for auscultation and mobilisation. This finding may be explained by the robot introduction. In [106], authors found that a humanoid introduced as a fully autonomous robot, is less preferred for collaboration than a teleoperated robot. A humanoid introduced as a teleoperated robot, on the other hand, is accepted either in autonomous or teleoperated mode. In this study, GARMI was verbally introduced as a teleoperated robot and, thus, may be accepted in both control modes. For the functionality of medication management, users accept the provision of one’s daily medicine, but strongly prefer a human-in-the-loop for final verification. Also the authors of [83] show that the majority of older adults prefer a robot reminding of or delivering their medication, but are wary of a robot’s capability to reliably select the correct medication. Therefore, engineers should be encouraged to implement medication management as a semi-autonomous task, with humans acting as a safety redundancy to quickly verify that the robot automatically selectedg the correct medication. The conducted study comprises some limitations. First, although the process of participant acquisition aims to obtain a comparable level regarding the general attitudes towards robots between the two groups assessed at the two locations, a self-selection bias still needs to be assumed as participants hold different attitudes towards robots than non-participants. To control this fact in future studies, questionnaires assessing general attitudes towards robots can be applied like the Robot Attitudes Scale [107] or the NARS questionnaire [108]. Furthermore, user acceptance in this study is assessed as the intention to use specific functionalities of GARMI. Even if previous studies confirmed that actual usage can be predicted by intention to use [86], this correlation has to be proven in user studies with direct experience of the functionalities. But as some functionalities are not yet on the required level of development to be applied in user studies, the results of this study shall guide follow-up studies on the user’s willingness to use the robot. The study results verify the overall acceptance of all suggested functions. Therefore, none of the functions is required to be excluded from GARMI’s task list. Low scores for the willingness to use appear especially for hazardous or highly health-critical functionalities like shaving and medication management. Therefore, we suggest to develop accompanying training and robotics education programs and validate their utility to timely acclimate future users to the new technology. Our study results show that participants to whom GARMI is demonstrated in real life give significantly lower scores of anxiety towards GARMI. This emphasizes the benefit of such training programs. Furthermore, the conducted analysis of sensitive user characteristics and perceptions provide guidance which user groups such robot training programs should particularly target. Finally, the findings of this study also support the technological development of GARMI. In example, pet ownership seems to influence the willingness to use GARMI positively. Therefore, sufficient path planning algorithms and detection methods to ensure safety are required to consider fast moving and highly unpredictable beings in the older adults’ environment. The presented results are assumed to be only generalised to similar robots like GARMI, as functional robot parameters like size as well as design parameters like level of humanness influence user acceptance of assistive social robots [2].

Table 15 Summary of regression analyses

6 Conclusion

As the possible use cases for assistive social robots increase rapidly with technological progress, user acceptance of assistive robot tasks becomes a keyfactor to successful robot integration. Following a user-centered approach the investigation of user’s willingness to use certain robot functionalities is required in the early robot development stage. In this paper, we conducted a task-dependent application of the Almere questionnaire aiming to understand influencing variables for the acceptance of particular robot functionalities. Alongside well-established Almere questionnaire constructs we included additional influencing factors on acceptance to broaden the understanding of user characteristics for the acceptance of robot’s autonomous and teleoperated functionalities. Besides utilitarian factors such as the perceived usability the human trust deceives the willingness to use physical human-robot interaction. We also observed varying willingness to use different GARMI functionalities emphasizing the demand for task-dependent user acceptance evaluation.