Keywords

6.1 Robotic Assistants: Challenges, State of the Art, and Future Research

The design and realization of robotic agents that can master everyday activities, such as setting a table for breakfast, loading a dishwasher, and preparing a simple meal at human level, is still a very challenging task. Despite many recent advances in Artificial Intelligence, there are many yet unsolved problems. In this section, we will examine why the development of autonomous general-purpose robot agents is such a complex and challenging task. We will also briefly sketch the current state of the art of the field, and how future research in AI and robotics could address these problems and overcome the existing barriers through the development of novel cognitive architectures and integrated hybrid knowledge bases for robots, which combine symbolic knowledge, fine-grained physical simulation of the real world, and powerful reasoning methods.

6.1.1 The Challenge of Manipulating the Physical World

The need to change the physical world to achieve one’s goals is arguably one of the key driving forces behind the cognitive development of the human brain (Wolpert, 2011). Therefore, creating robot agents with human-level competence in goal-directed manipulation of objects and substances has been, is, and will continue to be one of the grand research challenges in AI and robotics (Kuipers et al., 2017). We can appreciate the magnitude of this challenge by looking at the breadth and depth of skill with which humans accomplish tasks, such as pouring substances: humans can pour water out of a pot and pancake mix into a pan; they can separate egg yolk from the egg white, extinguish fire, neutralize acid, and pour beer into a glass, to name only a few variations. These pouring tasks involve different substances being poured, different containers, and different tools. They serve different purposes and have different effects. Each variation of the pouring task requires its own specific behavior patterns.

Everyday manipulation tasks are usually stated in general vague terms. For example, when you are asked to “extinguish the fire” you have to translate this underdetermined task request into a context-specific body motion (to pour water on the fire) that is expected to achieve the desired effects and minimize the risks of unwanted side effects. This contextualization of underdetermined tasks is one of the most fundamental and challenging cognitive tasks that the human brain is capable of. The human brain harnesses powerful prospection capabilities (Williams, 2018; Szpunar et al., 2014; Jeannerod, 2001) to ensure that this contextualization typically succeeds on the first attempt, even for novel objects, tools, and context conditions and complex tasks. A number of researchers have stressed the essential role of prospection for effective agency. For example, Craik (1943) stated “If the organism carries a ‘small-scale model’ of external reality and of its own possible actions within its head, it is able to try out various alternatives, conclude which is the best of them, react to future situations before they arise, utilise the knowledge of past events in dealing with the present and future, and in every way to react in a much fuller, safer, and more competent manner to the emergencies which face it.”

The power of prospection becomes particularly evident in open-ended manipulation task learning when humans learn manipulation tasks that require flexible, robust, and context-sensitive behavior in very few and often even in a single attempt by watching task demonstrations (Laird et al., 2017). Humans can learn manipulation tasks so efficiently because they can understand why the demonstrated behavior achieves the task, they have intuitions about the physical properties of objects to be acted on and expectations of their physical behavior when manipulated, they can imagine how they would generate the behavior for a demonstrated action, they anticipate the effects of the envisioned behavior even for hypothetical conditions, they transfer the observed behavior to their own bodies, objects, tasks, and contexts, and they adapt specific sub-motions to ensure task success.

While researchers across many disciplines appreciate the key role of prospection for effective agency (Szpunar et al., 2014; Vernon, 2014; Jeannerod, 2001; McDermott, 1992; Nau et al., 2004; Shanahan, 2006), the design and realization of computational models—knowledge representation and reasoning (KR&R) frameworks—that can exhibit the prospection capabilities that suffice for the one-shot contextualization of underdetermined manipulation tasks is an uncharted, high-gain research challenge.

6.1.2 State of the Art

Software agents have learned world champion level skills in playing Go (Silver et al., 2016; Schrittwieser et al., 2019), even with minimal hand-coded knowledge (Silver et al., 2017) or when learning other (Mnih et al., 2015; Schrittwieser et al., 2019) including Dota2, a multi-player video game that requires complex, continuous actions (Berner et al., 2019). The learning of physical actions was tackled; for example, tasks such as solving Rubik’s cube (OpenAI, 2019) and picking up objects (Levine et al., 2018). These breakthroughs were obtained by combining novel deep artificial neural network (reinforcement) learning architectures, methods for generating huge amounts of training data or playing training games, and the computing power needed to learn complex tasks based on these data. These impressive performance breakthroughs do not mean that these technologies on their own can scale up to open-ended manipulation task learning (Marcus & Davis, 2021; Marcus, 2020). Perhaps the most obvious reason is that any manipulation task learning method has to avoid manipulation failures during learning as much as possible, but failure is an intrinsic part of deep reinforcement learning.

Learning as model building and improvement has recently gained momentum in the context of investigating computational models of cognitive development from babies to toddlers (Lake et al., 2016). Here, some models suggest that the learning agent starts with core knowledge about objects, actions, numbers, and space (Spelke & Kinzler, 2009; Spelke, 2000) and a “game engine in the brain” (Ullman et al., 2017; Schwettmann et al., 2018) as its native knowledge sources and apply learning strategies inspired by the metaphors of the “child as a scientist” (Ullman & Tenenbaum, 2020) and “child as a hacker” (Rule et al., 2020). This research direction proposes machine learning methods with much higher training data efficiency, better transferability of learned behaviors, and better coverage of open-ended task domains. The concepts of developmental learning are well-suited for the curiosity-driven, playful, explorative learning using simple toys that one does not have to know a lot about and where action failures are unproblematic.

6.1.3 Hybrid Knowledge Representation and Reasoning for Cognition-Enabled Robots

The realization of computational models for accomplishing everyday manipulation tasks for any object and any purpose would be a disruptive breakthrough in the creation of versatile, general-purpose robot agents; and it is a grand challenge for AI and robotics. Humans are able to accomplish tasks such as “cut up the fruit” for many types of fruit by generating a large variety of context-specific manipulation behaviors. They can typically accomplish the tasks on the first attempt despite uncertain physical conditions and novel objects. Acting so effectively requires comprehensive reasoning about the possible consequences of intended behavior before physically interacting with the real world.

Our research hypothesis is that a knowledge representation and reasoning (KR&R) framework based on explicitly-represented and machine-interpretable inner-world models can enable robots to contextualize underdetermined manipulation task requests on the first attempt. For this purpose the robot needs a hybrid symbolic/subsymbolic KR&R framework that will contextualize actions by reasoning symbolically in an abstract and generalized manner but also by reasoning with “one’s eyes and hands” through mental simulation and imagistic reasoning. This requires three breakthrough research results:

  1. 1.

    modeling and parameterization of manipulation motion patterns and understanding the resulting effects under uncertain conditions,

  2. 2.

    the ability to mentally simulate imagined and observed manipulation tasks to link them to the robot’s knowledge and experience and,

  3. 3.

    the on-demand acquisition of task-specific causal models for novel manipulation tasks through mental physics-based simulations.

The main societal impact of these breakthrough results will be the improvement of cognitive capabilities for explainable, robust, and trustworthy robot control programs that can accomplish a broad spectrum of service tasks and thereby substantially advance the field of human assistant robotics.

6.1.4 Everyday Activity Science and Engineering

In the collaborative research center EASE (“Everyday Activity Science and Engineering,” https://ease-crc.org/) we investigate the design, realization, and analysis of information processing models that enable robot agents (and humans) to master manipulation tasks that may appear simple and routine, but that are, in fact, complex and demanding.

EASE takes the perspective that the mastery of everyday activity can be formulated as the computational problem of deciding how robots have to move their bodies in order to accomplish underspecified manipulation tasks and that these decisions should be based on knowledge and reasoning.

The unique approach that EASE takes is that we investigate and develop complete robot agents that perform end-to-end context-driven manipulation tasks by leveraging

  1. 1.

    explicitly-represented knowledge,

  2. 2.

    explicit inherently-adaptable generalized action plans and,

  3. 3.

    powerful prospection mechanisms based on machine-understandable inner-world models.

The core of our approach lies in designing, building, and analyzing generative models for accomplishing everyday household tasks. A generative model provides the basis for a mapping from the desired outcomes of a task to the motion parameter values that are most likely to succeed in generating these outcomes. Such a model can be viewed as a joint distribution of motion parameter values and the corresponding task outcomes. In EASE, the generative model is realized through knowledge representation and reasoning, which is based on the robot’s tightly-coupled symbolic and subsymbolic knowledge about the tasks it is performing, the objects it is acting on, and the environment in which it is operating. These generative models are used to simulate various task execution candidate strategies before committing to one particular strategy to be performed in the physical world.

The research into generative models of everyday activities is inspired by investigations of the manner in which humans master their everyday manipulation tasks, the results of which provide the computational mechanisms that can then be used to replicate these human abilities in cognitive robots. EASE not only investigates action selection and control but also the methods needed to acquire the knowledge, skills, and competence required for flexible, reliable, and efficient mastery of these activities. Competence means that robot agents are able to translate underdetermined action requests into the appropriate behaviors and adapt their behaviors spontaneously to new situations and demands, allowing them to assist humans reliably in a wide variety of settings. Robots will have to act fluently without hesitation, understand what they are doing, communicate the reasons for their choice of behaviors, and improve performance by learning from experience, by reading, by observing, or by playing. Performing actions flexibly, robustly, and competently requires intuitive physics and commonsense reasoning in order to translate desired effects into the motion parameterizations that can achieve them.

EASE selects everyday activities as its target domain because they allow robots

  1. 1.

    to structure their activities such that they exhibit regularities that can be exploited for better performance,

  2. 2.

    to continually acquire readily actionable commonsense and intuitive physics knowledge and,

  3. 3.

    to improve performance by specializing general actions through the exploitation of task constraints, structure, and regularities.

In the EASE Robot Household Marathon Experiment (Kazhoyan et al., 2021, https://www.youtube.com/watch?v=pv_n9FQRoZQ&t=44s) we demonstrated a generative model which enables physical robot agents to set and clean a table given vague task requests. This generative model only requires a carefully designed, generalized action plan for fetching and placing objects, which is autonomously contextualized by the model for each individual object transportation task. Thus, the robot autonomously infers the body motion that achieves the respective object transportation task and avoids unwanted side effects (e.g., knocking over a glass when placing a spoon on the table) depending on the type and state of the object to be transported (be it a spoon, bowl, cereal box, milk box, or mug), the original location (be it the drawer, the high drawer, or the table), and the task context (be it setting or cleaning the table, loading the dishwasher, or throwing away items). The body motions generated to perform the actions are varied and complex and, when required, include subactions such as opening and closing containers, as well as coordinated, bimanual manipulation tasks (Fig. 6.1).

Fig. 6.1
An photograph of a P R 2 robot performing a task of setting a table. The robotic arm holds a box package placed on a table.

PR2 robot setting a table in the EASE Household Marathon Experiment

We were able to show that the competence of the generative model can be increased by asserting additional generalized domain, commonsense, and intuitive physics knowledge and reasoning, and that substantial parts of such knowledge can be acquired by the robot itself through experience, observation, and taking advice. In addition, the model exhibits impressive introspective capabilities that enable the robot agents employing it to answer questions about what they are doing, why, how, what they expect to happen, and so on. In simulation, we accomplished this scenario in even more variations, such as different kitchen setups with different furniture arrangements, on different robot platforms, and we also applied our generalized fetch and place plan in different domains, specifically retail and assembly domains.

Our future research will focus on the integration of our approach to parametrized general planning and the hybrid KR&R framework into a general cognitive architecture for autonomous robots. On the basis of this architecture we will design, implement, and experimentally investigate robots that can successfully interact with other robots and with humans in virtual and physical environments. This involves a transition from a focus on goals, intentions, and actions, to shared goals, shared intentions, and joint action, requiring the use of powerful mechanisms such as implicit communication.

6.2 Communication with Robots

The expert responses also reflect the challenge of enabling communication with robots.Footnote 1 We formulated 11 items on human–robot communication in everyday life, all looking ahead to 2030. Will robots then tend to replace humans situationally in interpersonal communication? Will specialized robots then provide psychological advice (counseling)? Will humans then trust AI more than humans themselves? Will AI assist in rational choice (guidance)? Will humans first seek a doctor’s advice from a robot in telemedicine (consultation)? As detailed elsewhere (Engel & Dahlhaus, 2022, p. 358, Table 20.A2), the answer to all these questions is probably not. While the expert group is thus quite pessimistic about robots providing required guidance, counseling, and consultation, the group appears undecided if it comes to communication with personal avatars. In this regard, it appears only possible that lifelogging will issue in communication of humans with personal avatars, and such avatars will have become steady advisory life companions. The same undecided tendency characterizes the response to the statement that robots keep lonely people of different ages company at home. In contrast, three scenarios appear rather likely than unlikely. So, the group of experts expects that robots will keep older people company at home, that bots will communicate as well as humans, and that AI and robots take up increasingly more assistant functions in the life of humans.Footnote 2

6.2.1 The Challenge of Enabling Robotic Skills

At first glance, the opinions the experts expressed in these answers turn out to be quite pessimistic. On the one hand, this may be because sample tasks, such as counseling, guidance, and consultation, require ambitious functional and extra-functional (cognitive and emotional) skills, such as empathy, and they do not assume that robots will already have them in the survey’s reference year, 2030. On the other hand, the answers can also reflect the difficulty that potential users need not accept a robot simply because it appears to be competent. We want to shed light on both aspects. While in this section we ask about the temporal perspective on realizing a broad spectrum of robotic skills, in the following section, we aim to shed light on the question of user acceptance.

We formulated the survey question as follows: “The technical development of AI includes the solution of highly complex tasks. By when, do you suspect, will AI have the following capabilities?” The response scale follows:

  1. 1.

    is already possible

  2. 2.

    by 2025

  3. 3.

    by 2030

  4. 4.

    by 2035

  5. 5.

    by 2040

  6. 6.

    by 2045

  7. 7.

    by 2050

  8. 8.

    later

  9. 9.

    will not be possible at all

Figure 6.2 shows a wide range in the mean expected time periods and considerable uncertainty in the underlying temporal estimates. Regarding the functions expected only in the longer term, the temporal estimates exhibit a particularly large spread. Table 6.1 explains which functions are graphed in each case.

Fig. 6.2
A box plot of time periods versus 13 technical skills of A I that will be developed through the years 2025, 2035, 2045, and 2050. Robots might develop a contextual understanding of a problem by 2050.

Robotic skills: By when?

Confirmatory factor analysis of these temporal estimates suggests the grouping that the left column of Table 6.1 indicates. Even if usable for a first orientation only, the CFA suggests a relatively clear pattern of relations between these assessments. Skills B, C, and D constitute a first factor that appears to cover robotic self-control and motor skills, associated with carrying out physical tasks on people autonomously, and moving around in the rooms of an apartment like a human. The factor correlates highly (r = 0.71) with a second factor. The skills A, J, and K constitute this second factor, also covering aspects of robotic self-control and motor skills, this time regarding autonomously moving in space and performing physical tasks. Striking here is the larger time spread between the single functions. With G, E, and F, the next factor represents cognitive abilities necessary to conduct personal conversations. Finally, a factor that appears to indicate cognitive and creative skills like a human’s covers the last four skills, H, I, L, and M. Thus, by and large, we observe a sequence in which robotic self-control and motor skills come first and cognitive functions come next.

Table 6.1 The technical skills of AI graphed in Fig. 6.2

6.2.2 Expected Robotic Skills and the Challenge of Communication with Robots

Robotic assistance implies repeated encounters, several times a day over a long period of time. This leads us to expect that accepting such encounters will only occur if they are sufficiently pleasant. This is challenging in two ways. On the one hand, robots must become equipped with the necessary motor, cognitive, and communicative skills, and on the other hand, people must be able to imagine interacting with intelligent machines. Interacting with a robot implies communicating with it. But that is exactly what people cannot quite imagine today. The results in Chap. 1 show that people today still find it very difficult to imagine conversations with robots.Footnote 3 Analogously, we also see in the present context that communicative skills are primarily not expected from an assistant robot. On the contrary, for a large majority of respondents to our population survey, assistant robots should not even have the ability to conduct personal conversations. Taken together, this creates a complicated situation. The reason is a remarkable correlation between the scales talk and care that we introduced in Chap. 1. “Talk” reflects the personal readiness to have conversations with robots, and “care” reflects the readiness to have robots assist in one’s care. Both scales are strongly correlated (r = 0.66) and indicate that the stronger the readiness is in one respect, the stronger the readiness is in the other. Conversely, with no willingness to communicate with robots, there is no willingness to include care robots in one’s life if necessary. The present section takes up these two scales and relates them to the qualification profile of an assistant robot as it emerges from the respondents’ preferences.

Fig. 6.3
A pyramidal illustration depicts the expected skills of care robots. It includes 9 categories of skills. The 3 corners of the pyramid label the proportions, yes, no, and other. Fetch and take away items is the most preferred skill.

Expected skills of care robots

We worded the survey question this way: “Provided that an assistant robot would later be able to perform the following tasks competently, reliably, and without errors: For what types of activities and conversations with people in need of care should an assistant robot be specially trained? What kind of conversations and activities should remain taboo for an assistant robot?” This was followed by the items Table 6.2 displays. For each such item, respondents were asked to choose between the three responses, the graphs of whose distributions Fig. 6.3 shows:

  • “An assistant robot should be specially trained for this” (bottom left [0] to top [1.0])

  • “An assistant robot should rather be trained for other tasks” (top [0] to bottom right [1.0]) and

  • “An assistant robot should not be able to do this” (bottom right [0] to bottom left [1.0]).

By locating each skill in the triangle of these three answers, the graphic conveys a good impression of the polarization pattern of the involved skills. The items form a narrow band along the no-yes poles; only one item (i.e., everyday conversation) has a value greater than 0.2 for the response option “Other.” Nearly all respondents prefer picking up and taking away items, only 2.5% would ban that skill. Shares between 0.6 and 0.8 preferred three further skills, while at the same time shares of less than 0.2 banned them. This applies to maintaining (emergency) contacts, monitoring medication, and playing cards/board games. The pronounced reverse of this high/low pattern holds for personal conversation that few approved and many rejected. The personal and everyday communication skills are the two skills that reach lowest acceptance and, at the same time, highest and third-highest rejection.

Table 6.2 Expected skills of care robots

How does the preferred qualification profile of assistant robots correlate with the respondents’ willingness to talk with them and to use them as care robots? To find this out, we use the scales of factor scores talk and care that we introduced in Chap. 1 and correlate them with the robotic skills the present section considers.Footnote 4 The analysis shows that whether someone thinks that an assistant robot should be trained for a special task or not depends sometimes less and sometimes more on personal readiness. For instance, the “pick up and take away items” as well as the maintenance of “(emergency) contacts” correlate least with both talk and care. Personal readiness then makes only a small difference. The situation is different with “help with personal hygiene,” for which we observe a small correlation with talk (0.26) and a high correlation with care (0.61). Thus, assistance robots receive this task primarily from respondents who can imagine using the services of a care robot for themselves or a close relative. Therefore, the skills we expect of an assistant robot are partly quite independent of personal readiness and partly depend upon it. This also applies to communication skills: both skills, everyday and personal conversation, correlate moderately-to-strongly with the personal willingness to talk and care.

6.2.3 Correlates of Talk and Care with Pictures of Robots

To make it easier for the interviewees to get started in the interview, we wanted to know what they associate with the term robot. We presented 12 pictures showing different types of robots and asked, “When the language comes up with ‘robots’: With what do you spontaneously associate with this term?”Footnote 5 Pepper makes it to number 1, probably due to its frequent media presence. It also comes as no surprise that a typical industrial robot also frequently makes it into the TOP 3 preference set. PR2 from EASE and the Care-O-bot 4 service robot from Fraunhofer IPA also often correspond to the spontaneously expected image of a robot.

Most of the 12 pictures of robots presented to the respondents do not correlate with talk or care. Of the nine machine-like robots, only the care robot “Service-Assistant” from Fraunhofer IPA shows weak but statistically significant correlations with these scales (talk: r = 0.21, b/s.e = 3.0; care: r = 0.29, b/s.e = 4.1). Regarding the three robots with suggested human-like physical attributes (head and arms), two correlate with talk but not with care: Fraunhofer IPA’s “Care-O-bot 4” (r = 0.22; b/s.e = 3.2) and the popular “Pepper” (r = 0.14, b/s.e = 2.1).

How People Perceive Social Robots

Intended as an easy entry to the survey, the above correlation analysis represents rather a coincidental by-product than a systematic exploration into the physical attractiveness of robots. However, we can refer to reviews and other studies. For instance, following Bartneck et al. (2020), the humans’ inclination toward anthropomorphism is likely to assign assistance robots to the role of digital companions in daily interaction (Bartneck et al., 2020). Lum (2020, pp. 145–146) discusses human–robot interaction “outside of industrial and manufacturing of products” and underlines the sociability of robots as “becoming an increasingly important component that robots may need in order to interact in a human world.” She also stresses the need “to focus on anthropomorphism directly” and concludes from her review that “one of the main challenges when designing robots will be people’s acceptance of robots sharing their daily lives” (Lum, 2020, p. 148). Stroessner (2020) identifies three dimensions in the perception of robot faces—warmth, competence, and discomfort—and reviews findings that underline the relevance of gender-typicality and humanlike vs. machinelike faces, in terms of evaluative responses and contact desirability with such faces (Stroessner, 2020, p. 38). Liu et al. (2022) present a recent study on people’s perceptions of social robots. They examine “how appearance and characteristic narrative, combined with warmth and competence perceptions, impact people’s perceptions and acceptance of robots” (p. 324), and, for instance, found out that “competent robots are preferred over warm robots, and appearance design is more effective than a characteristic narrative” (Liu et al., 2022, p. 338).

Human–Robot Interaction in Care and Daily Life

The development of assistance robots, especially for use near humans, poses a major challenge in various respects. Enabling cognitive functions and a well-functioning interplay of cognitive, communicative, and motor skills places high demands on the art of programming and robot construction. In addition, there are design questions to solve. Robots may have very different shapes, look machine-like or human-like, convey a warm and competent impression, trigger positive reactions or feelings of discomfort. Accordingly, people may find it not always desirable to interact and communicate with such robots. And given the current lack of willingness to hold conversations with robots, the solution of design issues may be a great help in the further development of assistance robots. In the nature of things, we will best achieve this through interdisciplinary cooperation of robotics, cognition science, psychology, and sociology.