1 Introduction

The unceasing technological development of the last decades has brought many advances to our society. Among these new advances, the development of autonomous robots that operate without human supervision opens a wide range of possibilities in tasks that can be dangerous for humans, are repetitive, or where the workforce is scarce (e.g. nursing). In these scenarios, autonomous robots usually have to assist people and interact with them, so endowing these machines with social behaviour is essential.

According to [1], social behaviour can be defined as “all behaviour that influences, or is influenced by, other members of the same species”. Therefore, since social robots are for interacting with and assisting people, emulating human behaviour and decision-making to make these systems autonomously fulfil their tasks enable better cooperation between social robots and their users [2]. Nevertheless, emulating biological functions in robots is not easy as many concepts intertwine to shape human behaviour. However, the artificial life community has typically addressed this challenge using ethological (study of animal behaviour) approaches where terms like perception, cognition, emotion and affect, homeostasis, motivation, learning, or social interaction are widely used. Next, we define these terms to help frame and understand the importance of these concepts on the autonomous and social behaviour of social robots, the topic of this review.

  • Perception: Human perception can be defined as our primary form of cognitive contact with the world around us [3]. Therefore, in robotics, it refers to the capacity to perceive the external environment.

  • Cognition: This term refers to the human ability to know, learn, and understand things [4]. Consequently, designing cognitive systems implies making them capable of reasoning about their actions.

  • Emotion and affect: Emotions are mental states derived from the situations that we experience that are sometimes translated into physical responses [5]. Thus, emotions and affect are typically used in robotics to emulate how the robot feels due to its experiences.

  • Homeostasis: The regulation by an organism of all aspects of its internal environment [6]. In robotics, it means emulating animal functions such as heart rate to regulate internal functions.

  • Motivation: Motivation is what urges and drives behaviour [7]. It is closely related to perception and physiological needs as the basis of behaviour selection and execution.

  • Learning: According to [6], learning implies gaining knowledge from study and experience. In robotics, it refers to improving the robot’s behaviour using past experiences after interacting with the environment. In social robots, the typical approach is Reinforcement Learning (RL).

  • Social interaction: Social interaction can be defined as any process involving reciprocal stimulation or response between two or more individuals [6].

Since the late 1990 s, many social robots with autonomous behaviour have been designed in areas such as education [8, 9], healthcare [10], companionship [11], or social interaction [12,13,14] emulating many of the previous ideas. In social scenarios, the interaction dynamics are typically unknown and unpredictable, so robots working in these environments must have appropriate decision-making capabilities to autonomously select their actions and successfully fulfil the task for which they are intended [15].

Considering these facts, since the early 1990 s, many researchers have focused on investigating how to endow robots with decision-making capabilities and have designed many models, typically emulating animal behaviour [16]. Nowadays, autonomous and social robots are deployed in many scenarios as promising systems aiming to improve our lives quality. Nevertheless, to continue enhancing these systems’ capabilities, we believe that a deeper analysis of the current situation of decision-making and control architectures is necessary, assessing their evolution over the years and framing their challenges and future goals. Previously, Cao et al. [10] described state of the art in behavioural models for social robots in healthcare. However, we have not found any review that addresses the evolution of decision-making systems (DMSs) for autonomous and social robots. For this reason, we propose this contribution that fills this gap in the literature by providing a comprehensive overview of control architectures for autonomous and social robots.

This manuscript reviews the evolution and trends of DMSs and control architectures for autonomous and social robots in the last three decades. Moreover, we analyse how these systems have evolved in their application to specific areas, the duration of their operation, the included learning methods and the use of biologically inspired models that emulate animal (human) decision-making. From this analysis, we evaluate some of the principal challenges of DMSs and control architectures to envision future work that may help overcome some of their main limitations.

Table 1 List of keywords used for searching the contributions included in our analysis

This manuscript is organised as follows. Section 2 presents the materials and methods followed during our study. Section 3 reviews the state of the art of DMSs and control architectures for autonomous and social robots by area of application. Next, Sect. 4 analyses the results of our survey in the last three decades, attempting to study the tendencies thoroughly these systems have experienced across decades. Then, in Sect. 5, we go deeply into the challenges that autonomous and social robots have to tackle, emphasising those aspects that we have acknowledged in our prior study. Section 6 addresses the future of DMSs for artificial embodied agents, providing our own experience as designers. Finally, we provide comparative extensive tables of the work reviewed in this manuscript related to each area in the “Appendix” section.

2 Materials and Methods

This section presents the methodology, based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA), we followed during our study to select the most appropriate contributions in control architectures and DMSs for autonomous and social robots.

2.1 Study Selection Procedures

This manuscript analyses empirical studies in the last three decades as the bulk of the contributions in the area were carried out within this period. The bibliography database we used to build our database and perform the analysis was developed by searching in Google Scholar, Scopus, and Web of Science electronic databases. These databases were selected due to the ample number of publications they contain and because they are used worldwide. Table 1 contains the queries employed for building the database used for conducting our examination.

The use of these keywords results in 182 hits in Google Scholar, 18 in Scopus, and 8 in Web of Science. The search was first conducted in Google Scholar, then Scopus, and finally Web of Science, obtaining 208 hits without duplicates. Unfortunately, we could not obtain the full-text of 5 works from this list, leading to a final number of 203 works to be screened. After reading the title and abstract of these works, we excluded 31 papers because (i) they were not written in English or (ii) the architectures presented were for fully teleoperated robots. Consequently, 172 full-text articles were assessed for eligibility.

Finally, we selected 148 works out of 172 possibilities because they fulfil our final requirements. These requirements were (i) describing the action selection or decision-making method for generating autonomous behaviour, (ii) involving humans in the decision-making process, (iii) indicating if they have been applied in real robots or just in simulation, and (iv) describing the system application. Figure 1 shows the process of identification, screening, eligibility, and inclusion in the analysis.

Fig. 1
figure 1

Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram [17] that represents the process carried out in our analysis. The records out of scope refer to those articles that present social robots but do not describe their DMS. We excluded 23 articles after evaluating their full-text because either: they addressed decision-making from a conceptual point of view without presenting a model (15), no DMS was presented (5), or the study was not about the control of autonomous and social robots (3)

3 Review

The following review thoroughly describes autonomous and social agents’ outstanding decision-making and control systems in the last three decades. After carrying out an extensive review of contributions describing DMSs and exploring the areas where these systems are applied, we opted to review and narrate the works in the following categories:

  • Research: In this category, we classify those publications that present decision-making and control systems as conceptual models not applied to any specific domain but purely designed for research.

  • Manufacturing: This category clusters contributions that present decision systems used in manufacturing and production environments, such as factories.

  • Healthcare: We classified the publications where a robot with autonomous decision-making improves people’s healthcare.

  • Education: This category includes contributions that promote people’s learning by using autonomous and social robots.

  • Entertainment: The contributions where decision-making and control architectures are used for the users’ entertainment are in this category.

  • Companionship: This category contains publications where DMSs are integrated into robots that provide companionship to vulnerable sectors of society, such as older adults.

  • Assistance and service: In this area, we present those publications concerned with assisting people and providing them with essential services to facilitate their tasks.

Then, Sect. 4 studies the number of works per decade and area, the evolution of the action selection and learning methods, if these works use bioinspiration, the HRI duration of the experiment where the architecture is integrated, and whether they applied on real scenarios and a real robot. These classes were selected to provide an accurate vision of the evolution and challenges of these systems. Besides, we use them in the comparative tables included in this manuscript in the “Appendix” section.

3.1 Decision-Making Systems in Research

In the last thirty years, many contributions have described decision-making and control systems for robots. Since the term social robot was not coined until the 1990 s, as Fig. 2 shows, our review starts with action selection architectures intended for autonomous robots. Then, with the rise of social robots, we provide a more detailed vision of architectures designed explicitly for social contexts and, more specifically, for Human–Robot Interaction (HRI). A comparative analysis of the works described in this section is in “Appendix A”.

3.1.1 The 1990 s: Initial Research Models

We begin this survey with the work carried out by Meyer and Wilson [18] at the beginning of the 1990 s, who presented a book about making robots intelligent and autonomous, providing insights about how to replicate human behaviour in robots using ideas previously published by Lorenz [19] and Tinbergen [20].

One year later, Maja [21] designed a framework for the autonomous navigation of mobile robots. The robot uses a compass and sonars as well as if–then rules to accomplish its navigation goals. Similarly, Mahadevan and Connell [22] proposed the autonomous control of a robot, but this time, using Q-learning combined with statistical clustering to select actions. They were the first to use Reinforcement Learning (RL) in autonomous robots, a widespread technique nowadays. In the same year, Elliot [23] designed a multi-agent virtual world to simulate the emotional behaviour of autonomous agents. Each agent presented their personality generated from the model of emotion due to Ortony et al. [24]. The behaviour of the agent depends on the agent’s personality. The study aimed to analyse the role of emotion in decision-making and behaviour.

Dorigo and Schnepf [25] designed a conceptual robot controller that can adapt to a dynamic environment. The robot incorporates Genetic Learning [26] to update its behaviour depending on the state of the environment as perceived by the robot’s sensors. Then, using an arbitration system, a set of rules selects an action. Like the previous paper, Hayes and Demiris [27] presented a model based on learning by imitation, where a robot selected its behaviour by perceiving the actions of a teacher robot. The novelty resides in knowing when to carry out learning depending on the usefulness of the teacher’s action. Continuing in the design of autonomous mobile robots, Nolfi et al. [28] analysed in 1994 how to conceptually design autonomous mobile robots using evolutionary approaches, providing different neural controllers to evaluate the behaviour exhibited by the robot in each situation and obtain the best solution depending on the robot’s situation. In similar scenarios, García et al. [29] explored in 1995 how to make autonomous robots work in navigation tasks, focusing on obtaining a scalable and modular model based on rules organised in decision trees.

By the mid-1990 s, the tendency started to change, with the development models inspired by nature. This does not mean that researchers abandoned probabilistic and rule-based models, but the number of publications emulating biological functions in robots grew notably. In this line, Steels [30] explored how to address autonomy and intelligence in artificial agents from a biologically inspired perspective. The author stated that biologically inspired decision-making is essential for the agents’ survival to make more capable robots. Deepening this concept, Webb [31] presented in 1995 a publication concerning emulating the behaviour of crickets so as to be implemented by a robot. The study’s goal was to understand better animal ethology and the sensorimotor problems of animal robots. The decision-making consisted of selecting the best action depending on the robot’s state.

During those years, many authors started their research careers in autonomous robots. Some of these authors considered the work of Velásquez [32] in 1996 as a model for representing emotion and motivation in autonomous artificial agents. The architecture includes many essential biological aspects of humans, characterizing how we perceive the environment to make decisions. The dissertation presents great insights about making emotional and intelligent agents. In addition, the system was tested in the social robot Simon to work in HRI.

A couple of years later, in 1998, Velásquez [33] developed a new model for the autonomous decision-making of artificial emotional agents. The model simulates internal deficits that originate with the emotional responses to perceptions. Then, emotions, perceptions, and deficits influence the robot’s decision-making to produce appropriate actuation commands. The system was tested in different robots to explore the role of emotions in selecting actions. Using some of Velásquez’s ideas, Webster [34] introduced in 1997 the basics of emotional computing and intelligent processing to attain autonomous behaviour, positing that autonomous agents require reasoning and emotion to adapt to dynamic and complex situations. Like Webster, Arkin [35] studied how to endow a robot with autonomous behaviour, but from a more motivational perspective, addressing important aspects of human behaviour such as socialization, adaptation, and perception, from the perspective of both deliberative and reactive processes.

Cañamero presented in 1997 one of her first publications [36] in autonomous artificial agents with biologically inspired behaviour, describing a newborn living in a virtual world, whose primary goal is to survive. The model shapes essential functions like physiological deficits, motivations, and emotion, allowing the agent to exhibit a fully autonomous behaviour. In addition, the agent needs to interact socially with other virtual agents to reduce social needs and obtain the resources to incorporate learning mechanisms. The action selection consists of reducing the deficit associated with the motivational state with the highest level of intensity. As we will present later, this author updated this initial model on many occasions, applying it to HRI scenarios.

Moving back to fuzzy control and mobile robots, Tunstel et al. [37] presented in 1997 a DMS based on fuzzy rules for autonomous navigation. The fuzzy rules evaluate the robot’s goal and the sensory information to generate appropriate behaviours and fulfil the predefined task. Similarly, El-Nasr and Skubic [38] proposed a DMS based on fuzzy control and emotion for autonomous mobile robots. The system evaluates the robot’s internal and external state, allowing it to react to unexpected situations. The model explored the significant role of emotion in decision-making, paying particular attention to negative emotions such as fear, pain, or anger. Continuing in this line, Arsene and Zalzala [39] designed in 1999 a fuzzy controller for autonomous navigation in complex environments. The robot’s decision-making combines a deliberative task planner based on fuzzy rules with a reactive layer for collision avoidance.

As did the previous authors, Matarić [40] explored in 1998 how mobile robots should produce autonomous behaviour based on biologically inspired concepts such as learning and adaptation. The paper identifies coordinating multiple behaviours and working in multi-agent environments as significant challenges of mobile robots. In this line, Reif and Wang [41] also presented one year later a DMS for the distributed motion control of mobile robots working in groups. The decisions of each robot are based on potential fields, so the action taken by a robot depends on the position and potential field of the other robots. The field could be attractive or repulsive, depending on the social relations between the robots. Additionally, to the previous literature, Ishiguro et al. [42] also presented in 1999 a robot controller for autonomous mobile robots. In this case, the action selection depended on a planner situated module that generated appropriate paths for navigating safely using the information provided by the sensors.

In 1999, the term autonomous social robot started to appear in the literature frequently. Billard and Dautenhahn [43] designed a DMS for robots to imitate the behaviour of other robots in social tasks. Similar to and contemporary with the previous two publications, Rooney et al. [44] developed a decision-maker for social robots working in HRI. The architecture contains a deliberative layer and a reactive layer. While the deliberative layer makes long-term plans, the reactive layer produces fast reflex behaviour reacting to stimuli. Like the previous publication, the architecture proposed by Gadanho in her PhD thesis [45] also combined deliberative and reactive processes supported by learning, adaptation, and emotion. All processes were based on biological animal functions. The system was tested in simulation in long-term trials and is considered a noteworthy advance in including RL with autonomous decision-making. Some years later, in 2003, Gadanho [46] updated her architecture, including perceptions, needs, emotions, and RL, to select the most appropriate behaviour to maintain optimal well-being. The emotion system modulates action selection and learning, including a cognitive system that depends on the robot’s goals and internal state.

Fig. 2
figure 2

Relevant publications in the area of research presenting DMSs and control architectures for autonomous and social robots

3.1.2 The 2000 s: The Rise of Social Robots

Already in the 2000 s, Webb [47] proposed a theoretical overview of how robots with biologically inspired behaviour can improve our understanding of animal behaviour. In line with Webb, Bryson [48] also reviewed bioinspired theories for endowing artificial agents with autonomous and intelligent behaviour. Their survey describes decision-making architectures based on modular systems, deliberation and reactiveness, and evolutionary theories. Both addressed decision-making from a conceptual point of view but provided brushstrokes about biologically inspired action selection methods for artificial systems.

Moving back to architectures applied in real agents, Estlin et al. [49] presented in 2001 a novel two-layered DMS for controlling robots. The top level generates plans, and the low level works as an interface to command the robot’s actuators. Decisions are based on a set of rules that evaluate the robot’s state and goals. The model allowed action blending and continuous operation in lengthy tasks. Scheutz [50] designed in 2002 an action selection architecture for autonomous robots. The architecture was developed so as to be integrated into different robots. The decision-making process considers the robot’s emotional state and an arbitration method to select the most appropriate behaviour. Also, in real applications, Nakauchi and Simmons [51] presented in 2002 a system for social robots acting in crowded scenarios. The system allows a robot to successfully obtain resources by recognising people’s social behaviour and navigation. Then, it generates appropriate behaviours using visual information and probabilistic estimations.

The work of Cañamero over the last decades is ample and provides an accurate representation of action selection methods emulating human biological functions. In 2003, she [52] explored how to simulate emotions in artificial agents for action selection, providing notions about the essentials of modelling emotion and how they affect decisions. One year later, Cañamero worked with Ávila-García [53] on how to modulate action selection using hormones. Their model contemplated essential aspects of human behaviour to endow robots with autonomy, emulating key processes such as homeostasis (autonomous control of internal body functions) and motivation [54]. Their Action Selection Architecture (ASA) computes the robot motivational states and, using a winner-take-all approach [55], selects the behaviour linked to the motivation with the highest intensity. The behaviour selection aims to maintain the robot’s internal milieu in the best possible condition. In 2005, they continued [56] their previous research in adaptive systems studying the role of artificial hormones on motivated behaviour, investigating how autonomous artificial agents’ social behaviour varies by modulating two hormones, influencing how we perceive necessary resources (stimuli) for surviving.

Unlike the previous literature, the framework of Duffy et al. [57] in 2005 was intended both for Human–Robot and Robot–Robot interaction in humanoid and navigation domains. The DMS contains deliberative, reactive, and social components to produce the most appropriate decision based on the robot’s goal, the information gathered from the environment, and structured rules. Konidaris and Barto [58] designed in 2006 an action selection method based on RL. The model emulates physiological functions in the robot and its deficits (drives). Then, a priority system determines the most urgent drive to define motivated behaviour. Consequently, the robot aims to learn a behaviour policy to maintain an optimal internal state.

Following with the use of RL in decision-making, Malfaz and Salichs [59] proposed in 2006 a system for autonomous social robots. The model simulates physiological functions like thirst or hunger, and the deficits originate in these variables with time. The robot’s goal is to maximize its well-being by learning which behaviour to execute depending on its internal and external situation. Moreover, the model incorporates the emotions of happiness, fear, and sadness to represent the well-being state of the robot and reinforce the learning. In 2010, they [60] extended their previous work by designing a DMS for autonomous social agents. The model grounds biological functions such as drives, motivation, and learning (RL) to allow agents to survive in a virtual world. Finally, they moved in [61] to a system more focused on the emotional component of decision-making and expressiveness.

Since 2005, several architectures have been developed for HRI. In this line, Michalowski et al. [62] introduced in 2006 a model for representing the engagement of people interacting with a social robot. Depending on the user’s spatial position and head pose, the robot generates a profile that determines their level of engagement. Then, the robot chooses its subsequent behaviour based on that level to continue engaging the user in the interaction. In 2008, Walters [63] presented his PhD thesis about generating behaviour in non-verbal human–robot communicative scenarios. The study contains a large set of HRI experiments where the robot chooses its actions based on the non-verbal information provided by the user during the interaction, so as to attain a well-defined social behaviour. Mohammad and Nishida [64] designed in 2009 a robotic architecture for social robots working in HRI. The system draws on neuropsychology to create complex action selection mechanisms that provide autonomous behaviour, selecting the most optimal action considering sensory information and specifically selected plans. Balkenius et al. [65] studied in 2009 the interaction between motivation, emotion, and attention in social robots. They designed a control model to learn how to autonomously behave using the influence of stimuli such as objects of attention, emotion, and motivated behaviour. The model emulates cortical brain functions to represent essential aspects of human decision-making in a robotic head to learn how to map specific situations to actions.

3.1.3 The 2010 s and Present: Cognitive Models for HRI

By the beginning of the 2010 s, the main goal of these architectures continued to be improving HRI. Scheutz and Schermerhorn [66] developed an emotional architecture for the autonomous control of social robots. The selection of an action is grounded on each action’s utility value defined from the evaluation of environmental cues. Thus, the goal and action selection of the robot depend on evaluating the benefits/drawbacks of executing a specific action in each situation using RL. Those authors [67] presented in 2015 a new decision-maker that acknowledges the violation of social norms by including predefined rules about how social robots should behave.

Like these two publications, the architecture developed by Shi et al. [68] proposes a method for creating dialogues for a social robot to communicate during HRI verbally. The system uses a tag-based method to generate appropriate sentences and coherent dialogue. Castro et al. [69] also designed in 2010 a DMS for the social robot Maggie [70]. The system uses biologically inspired functions to represent the robot’s internal deficits and external state by perceiving the environment. Then, the robot’s motivational states grow to urge behaviour using Boltzmann’s equation and RL.

Floreano and Keller [71] researched Darwinian Evolutionary Methods to endow robots with autonomous behaviour. Their idea was to build more and more capable robots by prioritizing the information of those agents that perform better. This proposal is applied to collision avoidance in navigation tasks. Action selection occurs with random mutation based on neural networks. Also, for mobile robots, Buendía et al. [72] presented in 2012 a controller for the task of following a person. The engine combines selecting a strategy for the navigation and a perception system that uses objects to generate the strategy.

Arkin et al. [73] developed in 2011 a DMS based on ethical and moral judgments for social robots. The behaviour selection module evaluates at every moment the agent’s perceptions and an interface that stores responsibilities and constraints to avoid unethical behaviour. Leite [74] addressed in 2015 how to maintain positive feelings in users during HRI. The robot’s decision-making is based on the inferred emotion of the user and on adaptive mechanisms to promote positive social behaviours in lengthy interactions. The decision-maker proposed by Scheidler et al. [75] in 2015 was intended to allow swarms of robots to operate in navigation tasks successfully. The model uses Monte Carlo RL methods [76] to produce the most accurate performance and fast execution time and feedback. The model presented by Qureshi et al. [77] in 2016 allowed a social robot to exhibit autonomous behaviour by learning social skills during HRI. The method uses Deep RL to obtain feedback about the robot’s action and, using trial and error, learn the best combination of actions in each situation.

By the mid-2010 s, there was a new tendency in the models: biologically inspired methods represent more complex cognitive functions than previously. The CAIO architecture developed by Adam et al. [78] is a clear example of this trend. This architecture consists of a deliberative loop that generates emotions and plans and a sensorimotor loop that evaluates external information and produces appropriate reactions. Cervantes et al. [79] proposed decision-making based on ethical behaviour. The selection of an action depends on the agent’s preferences, good and bad experiences, ethical rules, and current emotional state, drawing on studies in neuroscience and psychology. Vallverdú et al. [80] expanded the Lövheims model [81] to a more complex system in which emotional states influence the agent’s behaviour. The neurotransmitters dopamine, serotonin, and norepinephrine affect important brain regions involved in emotion and the selection of actions, varying the emotional behaviour of the agent [81].

Following the ideas previously presented by Cañamero, Cos et al. [82] designed in 2013 a homeostatic adaptive mechanism based on RL to modulate the internal deficits of a social agent. The model simulates physiological functions that evolve over time and adapt to the situation. The robot’s goal is to maintain its good physiological condition by learning which action produces the best result in each situation. Three years later, Lewis and Cañamero [83] investigated in 2016 the role of pleasure in decision-making. Using their previously presented architecture, they model a pleasure hormone that modulates their internal needs based on perceived stimuli. Then, action selection occurs using a winner-take-all approach [55]. Like the previous works, Lones et al. [84] studied in 2017 the role of epigenetic mechanisms in endowing an autonomous robot with adaptive behaviour. The model shapes the influence of different artificial physiological processes that control the energy or temperature of the robot. The errors of such variables translate into motivated behaviour.

Influenced by Cañamero’s ideas, Maroto-Gómez et al. [7] proposed in 2018 an RL model to allow autonomous social robots to learn how to behave in a dynamic environment. The robot had to learn how to maintain an optimal internal state by reducing its internal needs. The decision-making is grounded on the robot’s motivations psychological states that represent the robot’s needs. Then, three years later, in [85], they updated their previous model with Dyna-Q+, an RL algorithm that allows autonomous agents to speed up the learning process by representing a model of the environment. In this case, the robot’s goal is to behave while maintaining its internal deficits in good condition motivationally. The decision-making process uses autonomous action selection to reduce the most prominent deficit.

Kowalczuk and Czubenko have presented systems for general-purpose social robots in designing decision-making architectures. In an initial contribution, they [86] designed in 2011 a robot controller modelling biological functions such as emotion, personality, needs, and motivation. The decisions are made using fuzzy rules that also reflect the agent’s emotional state, considering the effect of external stimuli. Then, in 2018, they [87] presented ISD (Intelligent System for Decision-making), a cognitive architecture for autonomous robots. Decisions depend on the perception of stimuli, past experiences stored in long-term memory, and the robot’s artificial needs. In addition, the model includes emotional factors, such as emotion and mood, influencing how the robot perceives objects. Finally, Kowalczuk et al. [88] developed in 2020 a fuzzy control system for autonomous emotional, social robots. Emotions and mood arise from the stimuli the robot perceives, defining its emotional state. Then, the emotional state modulates the selection of actions using the ISD cognitive architecture applied to driving scenarios.

Moving back to specific HRI domains, Romero et al. [89] used utility functions based on probabilistic rules to generate the appropriate plans of a social robot. The utility model builds upon a motivational model that represents the cognitive functions of the robot. During HRI, robots should explain their actions proactively. Stange et al. [90] addressed this issue in 2019. They presented an architecture that allows social robots to explain themselves during HRI scenarios. The robot uses verbal communication to proactively let the user know its needs and intentions. The robot’s explanations arise by considering the user’s behaviour and the robot’s needs derived from motivational processes.

Esteban and Insua [91] presented a decision-maker for social robots based on emotion generation. The robot’s emotional state depends on the interaction with people, modulating at the same time the scores associated with a set of actions. In the final step, the action with the highest score is selected to improve the robot’s performance in HRI scenarios. Cunningham et al. [92] presented in 2019 a multi-policy decision-making architecture for allowing a social robot to navigate autonomously in dynamic, multi-agent environments. The novelty of the work lies in the planning of the trajectory selected from a predefined set of close-loop behaviours whose utility is previously calculated using a simulation process that considers complex interactions among the possible actions of the robot. Martins et al. describe in [93] a DMS based on partially observable Markov decision processes (POMDP), reward shaping, and RL. The POMDP deals with the fact that some information is not available when making decisions, using transition probabilities to select the best alternative. The reward function considers the impact of the robot’s action on the user on the fly, supposing a novel technique that had not been used previously. Lastly, the RL system lets the robot know the best action to execute considering its state. Decisions are planned considering a user model and a context model that situates the robot in the environment. Compared to many other algorithms, the system provides good results in HRI tasks with different levels of complexity.

Various contributions have been presented in HRI and modelling cognitive development in the last three years. In this line, Man and Damasio [94] studied in 2019 the role of homeostasis [54] in the self-regulation of artificial functions in robots. Their study proposes a biological model where the robot is built using soft materials, and the way it selects its actions is oriented towards self-regulating its internal body and consciously feel the consequences of these actions. Consequently, the selection of an action incorporates biological mechanisms based on the model of its mind to produce natural behaviour.

Then, Augello et al. [95] worked in 2020 on modelling a somatosensory system for cognitive robots, emulating how humans perceive stimuli and how these stimuli affect our selection of an action. The model uses an RL algorithm to learn the optimal behaviour to maintain the best possible internal state during HRI. The LIDA architecture developed by McCall et al. [96] in 2020 is based on motivated behaviour for the control of autonomous robots, proposing a well-defined internal system that allows the robot to behave by emotionally combining planned activities with reactive behaviours. The model uses Machine Learning to map the robot’s state to specific actions to maintain an optimal internal state without forgetting the robot’s goals.

In 2020, Hong et al. [97] treated the problem of engaging people in human–robot scenarios, introducing a model that estimates the user’s emotional state and uses visual and auditory cues to create the robot’s emotional state. Then, using a predefined set of rules and learning based on Bayesian computing, the robot decides how to sustain engagement in bidirectional conversations. The cognitive architecture developed in 2020 by Martín-Rico et al. [98] promotes the learning of a person’s face during HRI. The action selection evaluates the situation and matches it with knowledge stored in the robot’s memory that defines its behaviour. Finally, Kim and Bodunkov [99] designed a robot architecture that makes autonomous decisions in situations where the information is not sufficient: to overcome the lack of information, the robot’s decisions are based on estimations from the robot’s situation. These estimations consider the probabilities of executing specific actions for attaining the goal during navigation tasks, using entropy as the selection criterion.

Fig. 3
figure 3

Relevant publications for manufacturing applications containing DMSs for autonomous and social robots

3.2 Manufacturing

In the manufacturing sector, autonomous systems are essential in several tasks, such as logistics or production lines. However, when talking about autonomous and social robots, the literature is not as extensive as for other areas, as Fig. 3 shows. A comparative analysis of the works described in this section is in “Appendix B”.

In these scenarios, the first publication we found was by Agrawal et al. [100], who in 1991 presented a decision-making architecture for robots working in factories. The architecture addressed the problem of making decisions using a finite set of alternatives and different configuration attributes that affect how the task is performed. The system was implemented in a real application to make industrial robots work autonomously. In a similar application, Wang et al. [101] proposed in 1996 a behaviour-based model for controlling robots in factories. Its novelty resided in a DMS that considers the actions of other robots to execute a predefined task cooperatively.

From the end of the 1990 s, it is possible to find many models for the autonomous control of robots that act jointly with humans. Kalenka and Jennings [102] presented in 1999 a mathematical model for the autonomous social decision-making of robots working in a warehouse, including social norms and attributes in multi-agent domains, unlike previous work. Shah et al. [103] proposed in 2002 a task controller for the autonomous and intelligent movement of vehicles and robots. The DMS combines planned actions based on a heuristic search and a database representing the world’s dynamics and reactive responses generated from the perception system.

Clodic et al. [104] presented in 2007 a DMS for human–robot collaborative scenarios. The framework is used to synchronise the communication between a social robot and a human worker during a fetch-and-carry task. The DMS uses predefined rules that evaluate the robot’s situation and the human’s speech. Czubenko et al. [105] applied the ISD architecture mentioned in the previous section to autonomous driving scenarios. The architecture emulates essential aspects of the road by replicating human drivers’ needs and motivations. To conclude with the manufacturing sector, in the context of autonomous robots and lengthy interactions, O’Brien and Arkin [106] developed a circadian system to work in agricultural tasks. The circadian functions evolve as timers to represent the system’s daily needs. Then, the action selection method uses such fluctuations in the circadian needs to execute actions using a kind of winner-take-all approach [55].

3.3 Healthcare

Among the many areas where autonomous and social robots have been applied, the healthcare sector contains decision-making architectures in real scenarios, as depicted in Fig. 4. A comparative analysis of the works described in this section is in “Appendix C”.

In healthcare, most of the work has been concentrated over the last twenty years, mainly applied to children, older adults, and assisting caregivers during therapies.

Fig. 4
figure 4

Relevant DMSs for autonomous and social robots working in healthcare applications

Working with children, Dautenhahn and Billard [107] studied the effect of an autonomous social robot in healthcare applications. The robot works with children with autism in gaming and educational sessions. In related work, Feil-Seifer and Mataric [108] presented in 2008 a robot architecture for engaging children with autism disorder. Behaviour selection considers predefined behaviours with the perceptions observed from the child’s behaviour. Also, in robot–children interactions, Senft et al. [109] introduced in 2015 a new model for social robots working in child therapy. The robot uses a set of rules, and a homeostatic signal [54] representing the children’s engagement and previous interactions to select actions that serve the therapist during exercises. Those authors [110] updated their previous work with a DMS to assist therapists during sessions with autistic children. The method of selecting an action evaluates external stimuli and the context of the interaction to produce autonomous behaviour under the therapist’s supervision.

There has been much work on assisting caregivers to conduct therapy. For example, Hiolle et al. [111] in 2014 presented a ‘baby’ robot that adapts its emotional behaviour depending on its needs. The aim of the study was to investigate the responsiveness of a caregiver to these needs. The robot explored and learned from the environment during its life using neural networks. The robot’s selection of an action uses the perceptions and needs to define the arousal/comfort system that determines which action to take to maintain its comfort. Another example is Lones et al. [112], who presented in 2014 a hormonal system for the adaptive behaviour of social robots in HRI with a caregiver, proposing an adaptive mechanism to modulate the robot’s selection of an action depending on the stimuli perceived, the valence value defined by the impact and type of stimuli, and biological functions. The model accurately represents essential biological functions behind the behaviour, providing a robust biological basis for autonomous behaviour.

Following this line of research, Cañamero and Lewis [113] designed an adaptive framework for social robots assisting in healthcare. The robot Robin (NAO) can teach children to manage their diabetes using different activities while presenting their internal needs. The selection of an action is based on a winner-take-all approach [55] where the robot’s motivations compete to urge specific behaviours. In 2019, Lewis and Cañamero [114] presented a research model for how stress leads to compulsory behaviour. The model emulates physiological functions that are modulated by an artificial stress hormone. The deficits of these functions and the perception of resources define the robot’s motivation. Finally, these motivations urge the selection of behaviour. The robot’s stress is a function of the other hormones, which evolve depending on the robot’s deficits. Consequently, the study explores behavioural changes depending on the robot’s stress levels. The architecture designed by González et al. [115] in 2017 used a three-level hierarchical decision to build personalised therapies in rehabilitation scenarios. In the first place, the robot generates a personalised therapy. Then, it modulates the activity using online perceptions. Finally, it translates abstract actions into specific motor commands.

Cao et al. [116] introduced in 2017 a collaborative architecture to support children and caregivers during therapy. The behaviour selection combines a hierarchical approach with parallel execution. The model generates its emotional state based on its internal needs and stimuli using a valence-arousal space. Then, each emotion is tied to a specific behaviour triggered when the corresponding emotion has the highest intensity. The architecture designed by Lazzeri et al. [117] in 2018 attempted to replicate human minds in social robots. The concept of decision-making consisted of perceiving the environment, evaluating the situation, and deciding on the most suitable action. The model was tested with children with autism disorders, conducting sessions oriented to provide entertainment and companionship. Park et al. [118] presented in 2019 a model-free emotional architecture for social robots working in education. The system uses verbal and non-verbal cues to learn engagement promoting lengthy interactions. Using its learning capabilities, the robot selects the most relevant stories for each child, personalizing the interaction.

Many social robots with decision-making capabilities have been designed in the last few years in healthcare applications. The social robot Pepper has been used in healthcare to autonomously retrieve information from patients in a hospital [119]. During this task, the robot was guided by nurses to improve the questions that the robot asked the patients. The dialogue with the patient included questions about the patients’ home situation, general health, use of medicines, smoking, alcohol use, dental issues, weight, defecation, activities for daily living, sleep, cognition, possible stress due to recently experienced severe life events, potential problems at home or work due to their admission, and religion or belief. The idea of deploying the robot in this scenario was to make the process more interactive and easier to follow than questionnaires or nurses’ time. The results showed favourable acceptance rates of the robot by both men and women (the study did not yield significant statistical differences between genders).

The social robot Mario [120] was created in 2020 to work in residential care, assisting elderly people with dementia. The robot includes a software architecture that allows it to perform autonomous social behaviour while engaging such adults with different activities. As in previous work, it combines deliberative and reactive layers to develop plans and reactions to unexpected situations. The social robot Mini [121] was created to assist caregivers during cognitive stimulation therapies. This robot has a fully autonomous DMS for generating personalised therapies for each user. The action selection combines RL with predefined rules that assign priorities to different possibilities, such as executing planned events or reacting to stimuli.

For the iCub robot, Tanevska et al. [122] designed in 2020 a framework to maximise the pleasantness during HRI. The robot can personalise its behaviour while assisting caregivers by learning the effects of its action on users using their social signals and its internal needs defined as motivational urges. In a recent study, Foster et al. [123] developed, in 2020, a social robot designed to alleviate children’s pain during medical assistance. The system’s goal was to decide on the appropriate behaviour with which to distract the child from the intervention and avoid painful and panic situations. The action selection method employs the user’s state and action to decide on the best action.

To conclude our review of the healthcare sector, we present two up-and-coming applications to more specialised scenarios. Robinson et al. [124] used in 2020 a social robot to reduce the caloric intake of people and promote a healthy diet. The robot could perceive the consumption of snacks and analyse if it was desirable to avoid binge eating. Asprino et al. [125] designed in 2022 a software architecture for the autonomous control of the social robot Mario. This robot works in healthcare applications with people presenting dementia. The behaviour selection evaluates the perception of the robot and a knowledge database containing information about the object and their influence to modulate behaviour execution learning to personalise HRI.

Fig. 5
figure 5

Relevant publications containing DMSs in educational domains

3.4 Education

As Fig. 5 shows, many DMSs have been realised in the educational environment in the last decades. A comparative analysis of the works described in this section is in “Appendix D”.

We begin our review with Dautenhahn [126], who studied in 1999 the influence of a social robot on autistic children. The robot teaches the children to perform specific activities, supervising this in such a way that each task is fulfiled.

Breazeal [127, 128] in 2003 presented the social robot Kismet, an expressive anthropomorphic robot head intended for HRI. The robot includes mechanisms to improve social abilities and cope with complex social environments. Its decision-making involves evaluating its goals and the people’s speech to build a coherent dialogue based on predefined rules that favour learning in educational contexts. Kismet can express emotional cues.

Another platform applied to education is the iCat robot [129]. It was designed in 2005 as an autonomous robot that works in education and HRI. It incorporates a DMS that merges the information generated by an animation engine with a series of predefined scripts that contain gestures and activities that the robot executes. Similarly, the software of the social robot PaPeRo [130], presented in 2006, contains a DMS that allows it to execute autonomous behaviour in educational scenarios with children. The selection of actions merges planned activities personalised to the audience using the robot with reactive behaviours elicited from the perception of stimuli.

Fig. 6
figure 6

Relevant publications describing DMSs for autonomous and social robots applied to entertainment

In 2008, Mitnik [131] presented a line of research for deploying autonomous social robots in educational sessions. Unlike the previous literature, the robot can teach students different subjects, such as maths or geography, by performing a set of activities together. The sessions are predefined and involve the children by promoting their participation. Ushida [132] introduced in 2010 a mind model based on emotional responses for the autonomous control of social robots. The model was intended for HRI in educational environments, containing deliberative and reactive actions to build a natural behaviour using fuzzy logic. Like the previous work, using a mental model, Strohkorb and Scasselatti [133] developed in 2016 a reasoning architecture for human–robot collaboration in educational settings. The model focuses on maintaining a collaborative strategy while updating and optimizing it by gathering information from the environment. In addition, the action selection alternates the exploitation of the best alternative with exploring new strategies.

The following publications are examples of the impressive effects of using social robots in educational scenarios. Coninx et al. [134] presented an adaptive model for engaging children during educational sessions. Behaviour selection consists of adapting the behaviour by creating a specific profile for each child. This profile is built from feedback obtained during the execution of the exercises. Egido-García et al. [135] presented in 2020 the use of NAO robots in educational sessions with children. The model fuses the needs of the children, the caregiver, and the robot itself, to produce autonomous and personalised activities to improve the children’s logopedic skills. Mascarenhas et al. [136] designed in 2021 a new function for the FAtiMA toolkit, a model for the autonomous behaviour of socio-emotional robots in educational settings about bullying. The model makes decisions based on the exercise to be executed, the child’s emotional state, and a knowledge-based memory that stores rules linking situations to actions.

Ahmad et al. [137] introduced in 2021 an RL for improving the engagement and vocabulary learning of children. The decision-making uses social signals and memory-based knowledge to determine the best action to execute during the session. Kaptein et al. [138] addressed in 2021 the design of a DMS for lengthy interactions for educating children about healthy lifestyles using games. The action selection occurs in two stages, using an ontology-based system and evaluating the best action according to the robot’s current situation. In addition, the system includes learning methods to personalise each child’s activities to improve performance.

3.5 Entertainment

Some work has used social robots with autonomous decision-making in entertainment, as shown in Fig. 6. A comparative analysis of the works described in this section is in “Appendix E”.

Gu et al. [139] proposed in 2003 a DMS based on fuzzy logic for humanoid mobile robots in entertainment scenarios like the RoboCup. Kok et al. [140] presented in 2003 a DMS based on coordination graphs for robots working in multi-agent entertainment domains. The decision of each robot depends on the decisions and actions of the others, producing a coordinated sequence of behaviours. Using a biologically inspired model, Manzotti and Tagliasco [141] developed in 2005 a decision-maker based on motivations for robots. Unlike the previous literature, the motivations do not emerge from purely biological functions, but from the robot’s goals. The robot, intended for entertainment, generates motivated behaviour from the stimuli perceived from the environment and rules stored in a memory.

Fig. 7
figure 7

Relevant publications describing decision-making architectures for companion robots over the last three decades

Four engaging autonomous platforms were designed for children’s entertainment. Kozima et al. [142] presented Keepon in 2009 as a social robot for research, entertainment, and therapy. The robot includes a decision-making module that evaluates its situation and the actions of people to produce autonomous decisions adapted to the interaction procedure. The social robot Pleo [143] was conceived in 2010 by Fernaeus et al. as a toy robot for children’s entertainment. It presents an autonomous action selection mechanism based on predefined rules adapted to external stimuli. In a similar scenario, the social robot Maggie [144] was also used as a gaming platform with children. The social robot MiRo [145] appeared in 2015 as a research platform for entertainment. The robot incorporates multiple sensors to navigate the environment, executing various expressive behaviours autonomously. Its action selection consists of predefined rules that map external stimuli to specific behaviours.

Kaupp et al. [146] introduced in 2010 a decision-making framework for human–robot interactive collaboration. The goal of the model is to appropriately decide what and when to communicate with the human operator to complete semi-teleoperated navigation tasks successfully. After assessing environmental stimuli and the operator’s commands, decisions are made using probabilistic formulae. Bicho et al. [147] presented in 2011 a DMS for HRI based on neural networks. The system is intended for entertainment activities like building a toy in a cooperative task. The action selection uses the robot’s perceptions and goals to decide on the HRI flow. Schneider et al. [148] proposed in 2017 a controller for a social robot working in HRI. The system encourages people to exercise for more extended periods by promoting motivational behaviour. The action selection depends on a set of rules that combine the features of the people and the inputs of the perception system.

Bagheri et al. [149] presented in 2021 a framework based on RL to motivate users during human–robot entertainment activities. The robot’s action selection depended on the emotional state inferred from the participant to be emphatic and improve the users’ confidence and satisfaction. Saunderson and Nejat [150] presented in 2022 a hybrid hierarchical decision-maker to persuade people to do their daily exercises. The robot uses different RL algorithms and user identification to personalise the exercises of each user during lengthy HRIs depending on their emotional state. Maroto-Gómez et al. [151] presented in 2022 a DMS for the social robot Mini working in entertainment. The model uses estimations based on Preference Learning to propose that the robot uses its favourite activities. Action selection uses the Boltzmann equation, which balances selecting the user’s favourite activities with exploring new alternatives.

3.6 Companionship

The use of autonomous social robots to provide older adults with companionship has been explored since the beginning of the century, as Fig. 7 shows. A comparative analysis of the works described in this section is in “Appendix F”.

Undoubtedly the most famous robot in this application is PaRo [152], a robot for physical and emotional interaction with people with mental impairments. The robot’s decision-making works as a reactive system that produces actions after evaluating the perceptions sensed from the environment. Similarly to PaRo, Arkin et al. [153] presented in 2003 a promising model for the intelligent decision-making of the social robot Aibo. The model emulates the physiological and emotional processes occurring in a dog, thus providing the robot with an intelligent and autonomous behaviour to provide companionship working as a cybernetic pet. The robot’s goal is to maintain homeostasis and regulate its internal deficits [54] to survive in a changing environment. In addition, the robot incorporates learning mechanisms to associate objects with certain biological variables (e.g. food with hunger) and identify people’s faces. Saldien presented in 2009 the social robot Probo [154], an autonomous agent for entertaining people in hospitals and providing them companionship. Its behaviour can be manually tuned using a friendly interface with spontaneous reactions.

Turning now to decision-making architectures developed for companion robots, Samani and Saadatian [155] developed an action selection architecture for social robots based on the Probabilistic Love Assembly (PLA) emotional model. The selection of an action is based on the evolution of artificial hormones, yielding different emotional states. These hormones evolve depending on social interaction with the user, making the robot adapt its emotion and establish a social relationship with the user based on love. Grigore et al. [156] designed in 2015 a motivational model for the adaptive autonomous behaviour of social robots working as companions. The action selection mechanism is based on RL and chooses appropriate actions depending on a user model representing daily goals.

Fig. 8
figure 8

Important publications with autonomous DMSs applied to assistance and services domains

3.7 Assistance and Service

The literature review presented in this manuscript has shown that there is a wide range of applications where autonomous social robots assist humans in different tasks. A comparative analysis of the works described in this section is in “Appendix G”.

Most of the previous work describes DMSs that facilitate humans’ execution of different tasks to a lesser or greater extent. Now we will focus on systems providing a purely assistive behaviour working as tour guides, bartenders, or office assistants. Figure 8 shows the evolution of the most important work in assistance and service in the last three decades.

The social robot Minerva [157] was created in 1999 by Thrun et al. as a robot tour guide. It exhibited autonomous social behaviour combined with a user interface where visitors could indicate to the robot what to do (e.g. visiting a specific location). Jung and Zelinsky [158] proposed in 1999 an action selection method for two cooperative robots executing cleaning tasks. The action selection uses dynamic generation of paths depending on the previous action and a set of rules that actively inhibit the robot’s possible alternatives. That same year, Van der Loos et al. [159] developed a controller for a manipulator assisting people with a physical disability. Using probabilistic rules autonomously, the controller enables a robotic arm to help the user.

Lisetti et al. [160] designed in 2004 a decision-making architecture for HRI. The system was integrated in the service robot Cherry, which could express different emotions to improve its social abilities with people while assisting them by working as an office assistant. The decision making evaluates the robot’s emotional state to make the most appropriate decision. Similarly, the social robot Maggie [70] was designed in 2006 to work in multiple domains, such as entertainment, assistance, and education. In addition, it served as a research platform to study HRI. Its DMS combines a deliberative layer that plans based on the robot’s goal and a reactive layer to respond to environmental stimuli. The robot also employs learning, adaptive, and emotional mechanisms to improve its performance and engage users.

Hollinger et al. [161] proposed in 2006 a decision-maker for mobile social robots based on emotion. The robot was designed to work in conference assistance using predefined functions that mapped stimuli to emotional actions. The goal of this system was to improve HRI by including reactive behaviour to engage users and improve people’s acceptance. Like Minerva [157], the robot Urbano [162] was born in 2008 to work as a museum tour guide. Its decision-making consists of three heuristic search algorithms combined with fuzzy rules to produce the best possible presentation to the audience. Shiomi et al. [163] designed in 2009 a DMS to control the action of a group of robots assisting users in a shopping mall. The system generates appropriate instructions based on predefined rules for each robot to provide information about routes and recommendations by estimating the users’ behaviours. Therefore, the decision-maker coordinates each robot’s HRI and navigation to approach different users.

Alili et al. [164] introduced in 2009 a decision planner for human–robot collaborative scenarios. The action selection is based on a probabilistic model that evaluates the robot’s goal and the perception system (including the human intention) to make appropriate decisions in different assistive tasks. In 2014, Foster et al. [165] showed how a bartender robot could autonomously work in complex scenarios with customers. This time the action selection consisted of learning an RL policy to meet the customer’s needs. Petrick and Foster [166] presented in 2016 their work about autonomous planned action selection in HRI. The robot acts as a bartender requesting the user’s petitions. Actions are selected based on perceiving such petitions and comparing them with a predefined set of rules that indicate how the robot behaves.

Liu et al. [167] showed in 2018 how a robot could learn proactive behaviour using neural networks and user feedback during HRI. The robot controller generates both motion and speech actions using associations created by the learning model. Similarly, Malviya et al. [168] developed in 2020 a navigation system for social robots based on a finite state social machine. The robot operates as a tour guide, using its embodied sensors to produce suitable state transitions to exhibit a fully autonomous behaviour. To conclude our review, in 2021, Hedblom et al. [169] presented an action selection method based on image schema. The system evaluates logical rules that allow the agent to decide how to behave to attain a specific goal. The architecture allows autonomous social robots to work in everyday housework activities.

4 Analysis

In this review of DMSs and control architectures for autonomous and social robots, we evaluated 148 (from 208 screened) publications derived from our study depicted in Fig. 1. We are aware that there is more literature in this area, but we believe that these publications accurately represent the evolution of these software architectures over the last thirty years. As Fig. 9 shows, the number of publications has increased over the years, constituting an unceasing growth of autonomous and social robots in our society.

Fig. 9
figure 9

Percentage of papers reviewed in this manuscript per decade

In the upcoming sections, we investigate this evolution regarding:

  • Area of application.

  • Action selection method.

  • Duration of the HRI experiments (if specified).

  • Biologically inspired models included (if any).

  • Learning method used to produce decisions (if included).

  • Number of publications concerning real scenarios.

  • Number of publications concerning a specific robotic platform.

Fig. 10
figure 10

Number of publications per area of application. Most DMSs developed until today have been applied to research scenarios (\(52\%\)). To a lesser extent, some have been applied to healthcare (\(\sim 13\%\)), entertainment (\(\sim 9\)), assistance (\(\sim 9\%\)), education (\(\sim 8\%\)), companionship (\(\sim 3\%\)), or manufacturing (\(\sim 5\%\))

Fig. 11
figure 11

Number of publications per area of application and decade. Applications to research are predominant in all decades. However, the number of publications where DMSs are applied to other areas has significantly increased over the last decade

In all of these analyses, we first study the global distribution of the items in each category over the last three decades and then deepen our analysis, providing a detailed vision decade by decade.

4.1 Area of Application

In this study, we assessed the area where each work was applied. We classified the publication into one of the seven categories presented in Sect. 3. Although some work could belong to several categories, we opted to add each paper to its most relevant category to analyse if the areas of application have varied over the last three decades.

Figure 10 shows that of the 148 publications studied, 77 concerned applications to purely research purposes (\(52\%\)), 19 to healthcare (\(\sim 13\%\)), 14 to entertainment and assistance/services (\(\sim 9\%\)), 12 to education (\(\sim 8\%\)), 7 to the manufacturing sector (\(\sim 5\%\)), and 5 to robot that provide companionship (\(\sim 3\%\)).

If we deepen our analysis and review the last three decades (the 1990 s, 2000 s, and 2010 s to present) in detail, we obtain impressive results. As Fig. 11 shows, in the 1990 s, most work was not applied to a particular area, although some of them were applied in manufacturing, production, or assistance. However, with the rise of social robots, we observed that work has taken a more specific turn, especially towards healthcare, entertainment, and education. This does not mean a lack of work in research, since many systems are still applied in this context.

4.2 Action Selection Method

It is important to emphasise that we have been reviewing decision-making and control systems for autonomous and social robots in the last three decades. These systems are characterised by their presenting action selection methods to produce such autonomous decisions. In our study, we recognise four main types of action selection:

  • Biologically inspired methods: We include in this category those methods that take inspiration from biology by using emotions, homeostasis, or motivation to influence action selection.

  • Probabilistic and classical algorithms: In this field, we added those methods that based their action selection on probabilistic algorithms and classical approaches that do not imply learning (e.g. heuristics, genetic, or support vector machines).

  • Learning methods: Those contain some kind of learning, such as RL, Deep Learning, Neural Networks, or learning by demonstration/imitation are in this category.

  • Fuzzy control and predefined rules: We classified in this category systems that use fuzzy logic and predefined rules (e.g. if–then rules) to make autonomous decisions to fulfil the system’s task.

Figure 12 shows the distribution of the publications included in our review by the action selection method. It is noteworthy that all use at least one of the action selection methods studied but can incorporate more than one in specific cases. In this sense, the most used approach is the use of biologically inspired models (70 of 148) to drive action selection, followed by learning methods (63). Almost one-third of the publications (47) include probabilistic and classic approaches to decide which action the social robot should execute and 36 base this decision on fuzzy control or predefined rules.

Fig. 12
figure 12

Number of publications using each action selection method. There is no clear difference between each action selection method, although biologically inspired models (\(\sim 47\%\)) are the most used technique, followed by learning (\(\sim 43\%\)), probabilistic and classical methods (\(\sim 32\%\)), and fuzzy and rule-based methods (\(\sim 24\%\))

The analysis by decade shown in Fig. 13 evidences some valuable dynamics. On the one hand, the number of publications basing the action selection on learning methods has significantly increased in the last years, probably due to the expansion and development of Machine Learning and Artificial Intelligence. On the other hand, since the 1990 s, biologically inspired methods have become a powerful source of inspiration for developers. Modelling animal (human) biological functions is always a widespread technique, primarily if the robot works in HRI. Lastly, the metrics for more classical approaches, such as probabilistic or fuzzy/rules control, do not vary over time, always being a good alternative for developers.

Fig. 13
figure 13

Number of publications using each action selection method per decade. Both the learning and biologically inspired approaches have been gaining importance in recent years, although more classical methods such as fuzzy logic and probabilistic models are still used

4.3 HRI Experiment Duration

In addition to the previous analysis, we investigated the duration of the HRI interactions occurring in the experiments included in these publications. It is worth noting that some publications did not include experiments or specify their duration. In this regard, our analysis finds that 114 of 148 publications specified the duration of their HRI interactions while 34 of 148 did not validate the model in HRI or indicated the experiment duration. Based on this, we opted for dividing them into three types:

  • Short: Experiments with a single interaction which lasted less than an hour.

  • Moderate: Experiments with interactions that lasted more than an hour but occurred on the same day.

  • Long: Experiments that included HRIs on different days with an average duration greater than one hour.

In this assessment, represented in Fig. 14, we observed that most publications only reference short HRIs (81 of 119) and only some involve moderate scenarios (25). It stands out that there is alack of systems (7) working in real scenarios where lengthy HRIs are required. We believe that if autonomous and social robots are to be deployed in real environments, exhibiting autonomous behaviour for long periods is essential since otherwise, the investment and development that these systems require is not worth it.

Fig. 14
figure 14

Number of publications per HRI duration type. Most work uses DMSs only in short interactions (\(\sim 55\%\)), some use moderately long interaction (\(\sim 17\%\)), and few are used for long interactions (\(\sim 5\%\)). The rest of works do not specify the duration or test their models (\(\sim 23\%\))

Analysing the previous results by decade (Fig. 15), it is possible to perceive how short interactions predominate. Although some publications are oriented to moderately long interactions and a few to lengthy interactions, it is impossible to perceive any tendency suggesting that DMSs and control architectures are extending their usability in this regard.

4.4 Use of Biologically Inspired Models

In the previous section, we saw that many publications (a total of 70) employ biologically inspired models to shape the decision-making. This section explores these biologically inspired models, further investigating how animals’ biological functions are emulated in autonomous and social robots. The literature review presented in Sect. 3 allows recognizing four different kinds of biologically inspired models:

  • Homeostatic model: the emulation of animal (human) biological functions to influence decision-making, such as neuroendocrine responses, homeostatic and allostatic control [54], or physiological variables (e.g. heart rate).

  • Motivational model: the use of motivations as psychological states that impel the agent’s behaviour.

  • Affective model: emotional models based on emotion, mood, and personality which influence how the autonomous robot makes decisions.

  • Cognitive model: modelling cognitive and mental functions in robots.

Fig. 15
figure 15

Number of publications per HRI duration type per decade. Although moderately long interactions have gained importance over the last two decades, it seems that long interactions are not gaining enough importance to deploy robots in real scenarios

Figure 16 shows the distribution of publications presenting biologically inspired models to select actions. Of the 70 publications that included at least one of these models, 39 (\(\sim 26\%\)) include a homeostatic internal model that emulates physiological functions, another 39 (\(\sim 26\%\)) include an affective model where decisions depend on the emotional state of the autonomous agent, 32 (\(\sim 22\%\)) use motivations to urge behaviour selection, and 23 (\(\sim 16\%\)) model cognitive functions typically inspired by the functions of the brain and mental models. This suggests that most systems implement one model depending on the goal they want to reach, but just a few publications attempt to study the relations between cognitive and emotional functions.

Fig. 16
figure 16

Number of publications implementing each type of biologically inspired model. As shown, the distribution is very equal since many works combine more than one model

If we focus on analysing the previous results by decade, we can observe the huge increase in using cognitive models in the last ten years, over the two previous decades. The internal, motivational, and affective models all present a homogeneous pattern where no one stands out above the others.

Fig. 17
figure 17

Number of publications implementing each type of biologically inspired model per decade. In the 2010 s, the number of applications of cognitive models increased significantly. The other approaches maintained a stable distribution over the years

4.5 Use of Learning

Many of the publications (a total of 63) included in this review present learning models to improve the system’s decisions. This section explores the methods most used to gain experience and improve performance during autonomous action selection. Since some systems may combine different learning methods, each system can be classified into more than one of the following categories.

  • Reinforcement learning (RL): the decision-making is affected by learning from trial and error and past experiences.

  • Neural networks (NNs): systems that use neural networks to learn action selection, including Deep Learning, convolutional networks, and similar techniques.

  • Learning by imitation/demonstration: the systems gain knowledge for action selection by imitating other agents or after seeing a demonstration.

  • Other techniques: those systems that include learning methods to improve the action selection but using different methods, such as genetic programming or heuristic search.

Fig. 18
figure 18

Number of publications using each kind of learning technique for action selection. RL is by far the most used technique (\(\sim 26\%\)), followed by NNs (\(\sim 9\%\)), imitation/demonstration (\(\sim 3\%\)), and other approaches (\(\sim 8\%\))

Fig. 19
figure 19

Number of publications using each kind of learning technique for action selection per decade. As expected, RL is the most used technique in the last decade, although in the early 1990 s, learning was carried out using other methods based on probabilities or genetic algorithms

The distribution of publications by learning methods represented in Fig. 18 shows that RL wins by a landslide. 38 publications (\(\sim 26\%\)) of the publications include RL to improve decision-making, something we believe is closely related to the spread of social robots in the last two decades. Regarding the other alternatives, neural networks are also much used since 13 works of 148 (\(\sim 9\%\)), followed by other approaches, such as genetic programming (\(\sim 8\%\)). Lastly, a few publications use learning by imitation and demonstrations (5 of 148, \(\sim 3\%\)), two techniques that are not very common in social robotics.

Our previous hypothesis about the spread of social robots and RL techniques is reinforced if we analyse the distribution by decade. As Fig. 19 shows, the number of publications using RL in the last decade has increased significantly. In this graph, it is also possible to see an increase in the number of publications using some kind of learning, probably due to the expansion of Machine Learning and Artificial Intelligence in the last two decades.

4.6 Application to Real Scenarios

This section analyses whether the work reviewed in this manuscript has been employed in real scenarios or conceptual design and simulation.

Fig. 20
figure 20

Number of publications concerning real scenarios. In the last three decades, most results (\(\sim 58\%\)) have been tested in real scenarios

Fig. 21
figure 21

Number of publications applied to real scenarios per decade. In the 1990 s, most publications described DMSs tested in simulation. However, since the 2000 s, this tendency has changed

Figure 20 shows that \(\sim 58\%\) of the publications have presented tests in real scenarios in the last three decades. Deepening the analysis, Fig. 21 shows that the initial trend has reversed. In the 1990 s, most architectures were used in simulation or as conceptual designs (18 vs 15). However, in the 2000 s, most architectures were tested in real scenarios (18 vs 20). Finally, from the 2010 s to the present, there have been twice as many architectures tested in real scenarios as in simulations (52 vs 26). This suggests that most decision-making and control architectures are currently tested in real scenarios where people participate. These results align with the fact that current systems are applied to more specific tasks.

4.7 Systems Designed for Specific Platforms

This section investigates if the decision-making and control architectures reviewed in this survey have been specifically designed for a robot or, on the contrary, are general architectures designed to work in multi-system domains.

Fig. 22
figure 22

Number of publications where the DMS is designed for a specific robot. Most results are designed for general platforms (\(\sim 63\%\)) instead of for a particular system (\(\sim 36\%\))

Fig. 23
figure 23

Number of publications where the DMS is designed for a specific robot per decade. The tendency of the last three decades is to design DMSs that work in general domains and not only for a specific robot

As Fig. 22 shows, most of the architectures are designed to be integrated into general platforms (\(\sim 63\%\)) rather than specific ones (\(\sim 37\%\)). The analysis per decade supports the general results. In this case, the applicability of DMSs and control architectures to general platforms follows the same tendency, winning over the alternative of developing these systems for specific robotic platforms.

5 Challenges to Autonomous and Social Robots

The literature review and previous analysis have provided a concise vision of the benefits of social robots in different domains. However, these architectures also face important challenges that should be addressed to continue developing more and more capable systems.

5.1 Engagement in Lengthy Interactions

Our previous analysis shows a clear tendency to use autonomous and social robots in lengthy interactions. The design of an autonomous system for lengthy interactions has been scarce in the last three decades. Although it seems that social robots are starting to work in real scenarios assisting people in lots of services [15], our results indicate that most research only focuses on testing these systems in controlled environments where the HRI only last a few minutes. In this regard, most recent work addresses how to engage users in the interaction, principally during the realization of cognitive stimulation, physical activities, and educational exercises to avoid fatigue.

We believe that researchers in Artificial Intelligence who are working on designing robust control architectures for social robots should be aware of this issue and concentrate on designing novel action selection architectures for extended periods rather than for customised sessions. Thus, testing these architectures in real and unpredictable environments over long periods is essential to measure their real applicability and usability.

5.2 Multi-applicability

In line with the previous challenge, our study suggests that the application of DMSs has taken a more specific nature in the last decades. Initially, most proposals concerned control architectures with conceptual designs that were not applied to any specific area. Nonetheless, the number of publications that describe applications of autonomous behaviour in healthcare or education has significantly increased in the last years. Although we can see this applicability as a positive fact, we are still far from developing robust systems that can be used in multiple and diverse domains. Consequently, a significant challenge that researchers will face in the upcoming years is to design intelligent machines with autonomous action selection methods that can be used in specific tasks and in a wide repertoire of activities to assist people. Thus, we believe that these robots’ potential customers will increase their potential if they are oriented to a broader target population.

5.3 Adaptation and Learning

Reinforcement learning has become the most used learning technique in the last decade [170]. The possibility of learning how to behave from trial and error opens a wide range of possibilities to build capable machines. More and more publications are presenting control architectures that incorporate some kind of learning or adaptive system in this context. However, most of them only focus on adapting to those users for whom the system has predefined information. However, adaptation is not fulfiled when the robot faces unknown users or requires lengthy training times.

In these situations, we propose generalization methods based on predictions that, after dynamically obtaining the necessary information from the user using HRI, can estimate essential features and attributes to start the assistance with some degree of adaptivity and not from scratch. Then, during subsequent interactions, the system can make an autonomous action selection combining the initial estimates with new adjustments that accurately represent the user preferences to improve the quality of the HRI and meet the initial goals. In this sense, the challenge of DMSs is integrating recent and adaptive learning methods to improve the robot’s behaviour selection.

5.4 Lack of General Models of Shared Knowledge

Lastly, we would like to highlight a significant problem that is typically overlooked but affects these systems’ evolution. As we saw in Sect. 3, there is a large number of publications that present DMSs for autonomous and social robots executing a wide range of behaviour. However, in most cases, the researchers design their approaches without considering integrating the models of other researchers into their own models taking advantage from the previous research. This issue is probably due to a significant lack in sharing knowledge and publicly freeing our codes. Although it seems that there is a tendency for new researchers to share their DMSs to improve scalability and modularity, we believe we are still far from developing software solutions (in this case, DMSs for social and autonomous robots) that can be generally and easily implemented in different platforms to speed up technological growth.

6 Conclusion: The Future of DMSs

This manuscript started with a thorough review of the evolution of DMSs and control architectures for autonomous and social robots over the last thirty years. Then, we analysed the most important trends of this work to provide a concise representation of the fundamental challenges on which we still need to continue working to deploy these systems in real and lengthy applications.

We believe that social robots can provide multiple benefits to society, alleviating the payload and facilitating the execution of tasks by the most vulnerable sectors of society, such as older adults, children, or disabled people. Additionally, the aging of the population of the developed countries puts forward the necessity of developing intelligent and autonomous machines with robust behaviour to overcome the possible lack of workforce in specific positions related to healthcare or education.

These challenges push us to continue investigating along these research lines to provide solutions that can improve people’s quality of life.