1 Introduction

Robots are currently being studied and expected to be used in a wide range of social applications [1]. Zviel-Girshin et al. envisaged a rapid increase in robots in modern society to the point that most children will potentially be surrounded by a robotic environment [2]. According to Beran et al., infants and children are increasingly playing with robotic technologies during their playtime [3]. Consequently, investigating infant-robot interactions (IRI) as well as robots’ influence on young children’s cognition, learning, language, social, and moral development, is crucially important [4]. Although studies have been conducted to investigate these aspects in children [5,6,7], still little is known about the robot’s influence on infants.

Recent studies used robots as a tool for delivering early interventions to address developmental disability in infants [8, 9]. These studies corroborated the ability of robots to inspire and encourage infants to imitate desired patterns of movements [8, 9]. The widespread literature on mirror neurons [10] further suggested that interactions with an infant-sized humanoid robot may lead infants to imitate and practice key motor skills such as standing and walking. These findings highlighted the importance of promoting research on infants’ robotic technologies aimed at providing complementary support to human-administered therapy.

A key component of robots interactions for infants is the use of visual stimuli presented via robot behaviors [11]. Robots must be capable of reliably capturing infants’ visual attention to be able to teach and reinforce infants’ actions. To this end, robots need a method to gain infants’ attention. In addition, once gained, the robot must be able to maintain the infant’s attention. This includes the robot’s understanding of the infant’s current psychophysiological and emotional states, as well as the determination of the optimal course of action depending on the infant’s current situation. Indeed, a seamless infant–robot interaction requires the infant to be motivated to follow the robot’s actions (such as leg movements or eye gaze) [1, 12, 13]. Therefore, it is recommended that the robot positively impacts the infants’ curiosity [3] and first engages the infant in some form of social interaction or socially intelligent behavior [12].

The quality of the interaction between infants and robots, and the analysis of the robots’ influence on infants, can be investigated by detecting infants’ affective cues during such interaction. For this reason, the social robotics field is supported by affective computing, which represents a branch of the data mining field aimed at providing effective and spontaneous interaction between humans and devices [14]. One of its primary goals is to enable systems to understand the emotional states expressed by human subjects so that personalized responses can be delivered accordingly [15]. Specifically, affective computing focuses on the study and development of systems and devices that can identify, interpret, process, and simulate human affects [16]. The machine should be able to detect human emotional states and modify its behavior responding appropriately to those emotions. Therefore, reliable robotic emotion detection remains the cornerstone of affective computing [17].

By contrast, although affective computing during infants’ interaction with robotics systems is a fundamental task in the IRI field, researchers in this area still struggle to figure out how to deal with it. This is mainly due to the fact that emotions in infants have been principally studied by analyzing vocal or facial expressions. In fact, over the first few years of life, children develop patterns of facial, vocal, and behavioral (i.e., bodily) expressions that allow them to communicate their feelings and adjust those communications according to the situation [18, 19]. However, the models underlying infants’ emotional phenomena on facial or vocal expression are built on varying theoretical assumptions that include anatomical and biological aspects as well as different theories about the cause and purpose of emotions. Transforming these theories into an emotional computational model is a difficult endeavor in and of itself.

Other than vocal or facial expressions, emotions can be detected from physiological signals. Efforts for emotion recognition through physiological markers are evident in many studies using electrocardiography (ECG) [20], electroencephalography (EEG) [21] functional near-infrared spectroscopy [22, 23], skin conductance [24], and thermography [25]. The drawback of working with physiological readings is the personal access required to measure them. Direct contact with the infant’s body is needed to acquire most readings, but contact sensors used for measurements may perturb the infants’ body, potentially biasing the results. To favor the ecological dimension of infant robot interaction, it would be desirable to assess the psychophysiological and emotional states non-invasively. To this end, recently contactless technology such as thermal infrared (IR) imaging, which enables monitoring human autonomic activity and inferring psychological and affective states in a contactless manner without the subject’s constraint [26,27,28,29], has been introduced in IRI field. Finally, since infants’ emotions and drives play essential roles in generating meaningful interactions [30], the study of infants’ affective states during robot interactions can lead to new knowledge in the field of developmental psychology and inspire new insights for developmental robotics improvement [31]. Indeed, the emerging area of developmental robotics is oriented toward the advancement of robotics by attempting to reproduce infant-like behavior and learning.

1.1 Structure and Aim of the Study

The study’s purpose is to evaluate and assess the technologies and procedures used for infant affective states recognition that are most significant to the IRI field and further investigate the main research areas of affective computing in IRI applications. The current infant affective state recognition technologies have been surveyed in terms of accuracy achieved, their suitability in the IRI field as well as their potentialities and limits. In this sense, the study is intended to offer a potential solution to overcome the existing difficulties in this area and provide new perspectives towards a successful interaction between infants and robots. Furthermore, the study provides a perspective about future developments of emotion-aware robots.

This work is structured as follows. Section 2 describes the methodological and search strategy structure. In detail, the search was based upon two research questions (RQs): (1) the infants’ affective states recognition techniques currently available, relevant to IRI studies and (2) the IRI main area of applications where affective computing is necessary. Indeed, answering those two RQs would provide a comprehensive overview of the potentialities and limits of affective computing in the IRI field. Section 3 presents the results obtained from the literature search and the outcomes of each RQ are explained in a dedicated subheading. Such results are then discussed in Sect. 4.

2 Methods

This study was conducted as a systematic literature review based on the original guidelines as proposed by Kitchenham [32]. The literature survey was organized into two sections addressing different RQs.

RQ1

Examining the infants’ affective states recognition techniques and their relevance to IRI studies.

To promote an affective computing system suitable for IRI research areas, it is necessary to have a comprehensive understanding of the infants’ affective computing approaches currently available in literature. Since infants cannot vocally express their emotions, understanding their affect has been considered by many to present a significant challenge. Indeed, emotion research in infants has received a great deal of attention over the years, and psychologists have developed various modalities for infants’ emotion recognition. However, questions have been raised about the soundness and applicability of these approaches in IRI applications. Therefore, this RQ aims to investigate existing emotion detection techniques and highlight relevant features for IRI applications. Besides, understanding the infant’s emotional states is critical in many robotic applications and considered fundamental to effective social relationships and psychological adjustment. Indeed, the design of robots that respond autonomously and emotionally may provide an alternative for assistive therapy. Nonetheless, before robotic systems can be endowed with affective computing abilities the efficacy of the approaches utilized for baby emotion identification and categorization needs to be examined.

RQ2

Assessing the main research areas of affective computing in IRI applications.

Recently, there has been a growing interest in designing interactive robots that humans can spontaneously and intuitively interact with. To allow spontaneous interaction, robotic applications frequently use technologies designed to recognize the affective state of the human interlocutor. However, IRI may be fundamentally different from adult human-robot Interaction in that infants are not simply small adults. Their physical, neurophysical, and mental growth are ongoing, which may result in conditions and operational circumstances that differ significantly from HRI [33, 34]. Furthermore, conducting studies involving infant participants places significant constraints on the experimenter as well as possible ethical issues. Therefore, the present RQ seeks to identify and describe the research fields in which the IRI and the infant’s affective states recognition are valued. Indeed, the answer to this question will provide a clear picture of the potential use of robots with infants on the one hand, while also pointing out the technological challenges that must be solved in order to develop an affective computing system that satisfies the demands of IRI on the other.

The databases searched were both Scopus and Google Scholar. All the papers published in conferences and journals between 1990 and 2022 were considered. Papers published from 1990 were considered since from these years intelligent technology development made robotics applications valuable and profitable.

Concerning RQ1 the search was based on the words “Infant” OR “Babies” OR “Toddler” OR “Pediatric” AND “emotion” AND “Recognition” OR “Detection” OR “Computing”. In the Scopus database, those keywords were surveyed in fields such as article title, abstract, and keywords. In the Scholar database, on the other hand, the advanced search can be performed either by searching (i) the entire text or (ii) the title only. Therefore, the advanced survey was carried out by searching for “Infant” OR “Babies” OR “Toddler” OR “Pediatric” AND “Emotion” with at least one of these words: “Recognition” OR “Detection” OR “Computing”, and the field searched was the entire text. The overall search generated 771 results in Scopus and 739 in Scholar.

With respect to RQ2 the search was based on the words “Infant” OR “Babies” OR “Toddler” OR “Pediatric” AND “Robot” AND “Interaction”. In the Scopus database, the survey was set up by searching for those words within the following fields: article title, abstract, and keywords. The basic search generated 375 results. Whereas in Scholar, the search was based on “Infant-Robot interaction” within the entire text. A total of 260 results were obtained from the Scholar survey. The search was performed independently by two researchers.

The initial analysis consisted in filtering out papers related to subject areas such as arts, agricultural, biochemistry, environmental science, medicine, physics and astronomy, pharmacology, chemical, and economics. This procedure reduced the considered pool to 255 papers in Scopus and 106 papers in Scholar related to the RQ2, whereas with regards to RQ1 the pool was reduced to 262 papers in Scopus and 208 papers in Scholars. Therefore, the total number of papers from all the RQs resulting from the first screening analysis were 517 and 314 for Scopus and Scholar database respectively. The search was performed independently by two researchers. Results from each source were carefully examined for duplications. Document types of Google Scholar research outcomes were examined closely to ensure that they were related to reliable scientific sources. The review papers and all the papers that did not refer to a user study or that did not relate to IRI or infant emotion recognition were excluded, which reduced the considered pool to 153 papers. Within those papers, 142 were related to Scopus research, and 89 were from Scholar with an overlap of 78 papers. The manual review process was adopted for the final exclusion by scanning the papers’ abstracts. Exclusion criteria regarded all the results that were not conference or journal papers actually related to the specific RQ. The age of the study population considered in this work ranged from 0 to 36 months. After the review process, performed in accordance with [35], 98 papers were included in the present study (Fig. 1). The resulting papers were analyzed and grouped based on their experimental applications and operative RQs. A separate paragraph was dedicated to each RQ to ensure that appropriate literature was accurately covered and that it was suitable to respond to the specific RQ.

Fig. 1
figure 1

Literature screening procedure for the selection of papers included in the review

3 Results

Over the last several years, there is an increasing interest in the field of human-robot interactions (HRI) due to the increasing usage of robots not only in industrial fields, but also in other areas such as schools [36], homes [37], hospitals [38], and rehabilitation centers [39]. Therefore, research in HRI has begun exploring people’s perceptions of, and attitudes toward robot systems, including kinds of applications and tasks for which they might be useful [40]; the attribution of competencies on the basis of their physical appearance [41]; the relationship between the robot’s physical appearance and its behavior, or the effect of its human-like-ness [42]. Within HRI, IRI holds a peculiar place. Indeed, the interaction between infants and robots can be very different from the interaction between adults and robots due to the infant’s continuing neuro-physical and mental development [33, 43]. A fascinating challenge in IRI is to implement a human-like interaction in which the choice of the robot’s actions is taken based on the infant’s behavior. To this purpose, robots should be endowed with the ability to assess the infant’s affective state and determine whether or not to take action based on the infant’s emotional behavior. Besides, making robots respond spontaneously and sociably to humans implies that robots should have a degree of sensibility to human emotions [44]. The methodologies for infant affective states recognition relevant for IRI studies are detailed in Sect. 3.1. Whilst the primary research areas of affective computing in IRI applications are discussed in Sect. 3.2. Specifically, the following paragraphs describe the results obtained for each RQs.

3.1 Technologies used to Evaluate Infants’ Affective State and their Relevance to IRI Studies (RQ1)

The strategies adopted for infant affective state recognition relevant for IRI studies are mostly focused on non-invasive techniques that can automatically detect emotions. Behavioral analysis, such as facial expression evaluation, and non-invasive physiological signal analysis are the most commonly used approaches in this regard.

3.1.1 Infants Affective Computing through Behavioral Analysis

Understanding infants’ affective state is important to researchers, pediatric professionals, and parents alike. In fact, because infants cannot verbally report on their emotions, understanding their affect has been considered by many to present a significant problem [45]. Infants communicate with the outside world through their facial expressions, hand gestures, body action, sounds [46], and signed speech [47]. Therefore, these are important information sources to detect their emotions and needs. To study infant emotion, researchers have traditionally relied upon observable features. Observable, behavioral indicators of emotion regulation strategies in infants include facial expressions, attention shifting (e.g., gazing at or away from a stimulus), engagement with objects, and self-soothing (e.g., rubbing face or thumb-sucking) [48]. Indeed, research on emotional development in infancy has been mostly based on facial expression analysis and had greatly benefited from the use of video analysis and coding systems [49]. Literature research on the most widely used coding systems designed to analyze infants’ affective state led to three results, namely the Monadic Phases Coding System [50], the Maximally Discriminative Facial Movement Coding System (MAX) [51], and the Facial Action Coding System (FACS) [52]. Their applications result in similar affect codes, but the three systems are quite different procedurally. The Monadic Phases System assesses infant affect by combining information about facial and vocal affective expression, gaze, posture, and type of action [53]. The affect categories derived from MAX, and FACS, on the other hand, are based on facial expression only. Both these coding systems rely on facial anatomy. In detail, FACS operates under the assumption that emotions activate micro-expressions, resulting in subtle changes in facial muscles activity [54]. For this purpose, it defines individual components of muscle movement, i.e. the Action Units (AU) [55]. Specifically, it distinguishes 30 AUs. AUs are reliably associated with distinct emotions [54], based on the six universal facial expressions (happiness, anger, disgust, sadness, fear, surprise). By contrast, MAX differentiates three anatomical facial regions (forehead and brows, midface, and mouth) and discriminates among types of movement within each region [50]. Combination rules convert facial movement codes into expressions of affect. These three systems are widely used to investigate similar research problems, for example, the relation between affective expression and infants’ age, gender, and birth status. Interesting results on gender differences in infants’ emotional expression showed that boys have higher levels of arousal than girls in infancy, and boys show less language ability and inhibitory control than girls [56, 57]. Also, a large number of studies have focused on the type and the direction of influence between mothers’ and babies’ affective expression [58,59,60,61] and infant affective response to experimental conditions [62,63,64]. Furthermore, facial expressions analysis was also used to differentiate discrete infants’ emotions [65]. This latter analysis was commonly based on MAX and FACS system rather than the Monadic Phases System. Figure 2 shows examples of infants’ facial expressions associated with different discrete emotions such as surprise (a), fear (b), sadness (c), and happiness (d).

Fig. 2
figure 2

Adapted from [65, 66]

Representation of infants’ facial expressions associated with different discrete emotions such as surprise (a), fear (b), sadness (c), and happiness (d).

Expressions, particularly when coupled with vocal and postural behaviors provide valuable clues to the motivational state of infants who are unable to report what they feel otherwise.

Other than facial expression, a behavioral measure that is increasingly being investigated is connected to body gestures and infant motion. Many applications for analyzing humans and their movements have been developed with the advance of commodity depth sensors such as the Microsoft Kinect and its body tracking capabilities. However, the Kinect body tracking is limited to persons taller than 1 m [67]. Therefore, systems aimed to automate infant motion analysis made use of sensors that are attached to the infant body. Other methods overcome these limitations by fitting a simplified body model to the whole body [68] or lower limbs [69] of infants captured by RGB-D devices. The features employed for emotion recognition based on body motion could include absolute or reciprocal positions and orientations of limbs, as well as movement information such as speed or acceleration.

3.1.2 Infants Affective Computing through Physiological Signal Analysis

Emotion recognition using physiological signals is one of the branches of affective computing and several researchers employ bio-signals to estimate people’s affect. Indeed, even though, to a lesser extent, infants’ emotions have also been investigated through the analysis of physiological signals. This is because the ability to control emotional reactions to environmental stimuli develops during the first years of life [48]. The behavioral and cognitive construct of emotion regulation have been extensively examined in the developmental psychology literature, and they have claimed that personality, social competence, and problematic behavior have their origins in (or are influenced by) early emotional control [70]. Emotional reactivity (i.e., arousal) and the regulation of that reactivity are the two processes that constitute an infant’s emotional experiences [48]. The latency to respond to a stimulus, the intensity of the reaction, and the stimulus threshold required to elicit a response are all terms used to describe reactivity [71]. Internal physiological markers of arousal and regulations are present from early childhood [72]. For this reason, physiological measurements are often needed to better understand emotion regulation in response to environmental challenges. Such measures are mostly focused on the autonomic nervous system (ANS) activity. Indeed, the ANS, together with the hypothalamus, regulates pulse, blood pressure, breathing, and arousal in response to emotional cues [73].

In addition, studies have indicated that specific patterns of infant physiology, including vagal withdrawal, are predictive of emotion regulation due to their relationship with affective emotional experiences. For example, when infants experienced negative affect in contexts involving interactions with another person, they demonstrated evidence of regulation via vagal withdrawal (i.e., lower respiratory rate that indicates regulation [74]). One of the frequently employed paradigm to investigate emotion regulation through physiological signal analysis in infancy is the Still-Face Paradigm (SFP). Such a paradigm was designed to investigate the role of infants in social interactions as well as infant reaction to depression stimulation. The SFP is typically composed of three sessions of face-to-face interaction between an adult and an infant: (i) a normal parent-infant interaction; (ii) the ‘still face’ moment, when the parent took on a neutral expression and he/she was no more responsive to the infant; (iii) the moment when the parent resumed the interaction with the infant [75]. Physiological changes associated with arousal and regulation can also be examined during SFP. Many studies on infants’ arousal have relied uniquely on infant heart rate recorded through electrocardiogram sensors [76, 77]. However, other studies, have used thermal infrared imaging techniques to assess infants’ arousal based on skin temperature variation. Such studies relied on research showing that thermal variation can be measured at 6 months of age and autonomic changes can be inferred [78]. Indeed, the skin temperature profile is subject to various influences, including the cutaneous blood flow, local tissue metabolism, and sudomotor response, all of them being in turn controlled by the ANS [79]. Consequently, advances in thermal IR imaging technology allowed monitoring infant autonomic functions and inferring psychological and affective states. Moreover, since thermal infrared imaging is a non-invasive and contactless technique, it permits ecologically-valid settings whereupon infants participants are free to move without restriction. For this reason, it is especially valuable for observing emotions in infancy research since infants are difficult to engage in strictly controlled experimental settings. Aureli et al. assessed the nose tip temperature variation in 3- to 4- months-old infants, in order to explore the natural human process of attachment between baby and mother, and the effects of the SFP [80]. Behavioral data were also collected. The finding confirmed a parallelism between physiological and behavioral responses: infants exhibited no signs of stress or discomfort during the still-face moment and no drop in face temperature which is considered a sign of stress or anxiety. In contrast, a temperature increase was recorded in support of the parasympathetic system activation as a result of infants’ greater interest in the surrounding environment. This was also confirmed by the behavioral evidence, which revealed that children drove their attention outward because of the interruption of the interaction with their mothers. Moreover, researchers used thermal IR imaging to measure facial skin temperature as an index of mental stress in 8 to 15 weeks old infants when they were separated from their mothers [81]. Nakanishi and Matsumura investigated changes in facial skin temperature in 2 to 8-months-old infants, when they were laughing, as typical behavior of pleasant and joyful emotion [82] .

Besides skin temperature variations, physiological changes in response to emotion also occur in parameters, controlled by the ANS, such as blood pressure, heart rate, electro-dermal activity, pupil dilation, and respiration rate, which are not directly recognized by human observers Cirelli et al. focused on the emotional arousal and emotional regulation in infancy by employing skin conductance measurement [83]. Bainbridge et al. by measuring heart rate and pupil dilation demonstrated that infants relaxed in response to unfamiliar foreign lullabies [84]. The heart rate is widely used for infant arousal detection using a comparison of sympathetic and parasympathetic frequency bands of the signal. However, it is highly dependent on the body position during monitoring [85]. Minagawa-Kawai et al. studied the emotional attachment between mother and infants [22], Parsons et al. between parents and infants [23], both using the functional infrared spectroscopy technique (fNIRS). This technique enables the measurement of the localized hemodynamic response in infants.

3.1.3 Relevance of the affective computing techniques based on behavioral and physiological data in the IRI field.

The ultimate goal of IRI and HRI, in general, is to have robots interact socially, which also means having the capacity of generating coordinated and timely behaviors predicated on their social surroundings. Therefore, the recognition of the emotional reaction of the human interlocutors also needs to be performed in a timely manner. This aspect has a crucial impact on the selection of the emotion recognition technique to use for this purpose.

Researchers have usually focused on observable characteristics such as facial expressions or gaze analysis to assess infants’ emotions. Indeed, expressions, especially when combined with vocal and postural clues provide valuable information on the affective state of infants who are unable to convey their feeling otherwise. As a result, emotions recognition based on infants’ facial expression analysis has become highly influential [86]. However, some limitations to apply this technique in the IRI field have been identified. For instance, the processing may be very time-consuming. Indeed, in most of the studies analyzed, emotions or facial expressions were encoded using stop-frame video [51]. Moreover, behavioral analysis needs to be performed from more than one observer in order to ensure inter-observer reliability, thus making measurement accuracy a critical concern [52]. By contrast, daily life scenarios such of those of IRI applications require real-time responses from the sensors of interest. Automatic data recording and processing are therefore preferred. Besides, conventional recognition methods using facial images may lack recognition accuracy since they are not universal and depend on culture, gender, and age [87]. Automated approaches to assessing facial action are a potential solution to manual coding difficulties. Nowadays, these automated approaches are an active research topic in the field of computer vision and machine learning and often involve collaborations between computer scientists and psychologists [88]. Hammal et al. reported that automatic coding of AUs showed moderate to strong reliability with manual coding [89]. Yet, lighting conditions, auditory noise, also make these techniques challenging to be implemented in a real-world environment [90]. Even if this automated approach in infants is in the early stage of research, introduction into the IRI field of advanced image processing techniques, such as convolutional neural networks, encourages and fosters its improvement [88, 89].

On the other end, physiological channels can deliver reliable, crucial, and timely information about infants’ emotional symptoms, and emotional response to environmental stimuli. They are, however, mostly obtained through contact sensors [91]. Direct contact with the person’s skin requires the ability and the willingness to properly wear the device. Moreover, the time required for attaching sensors to infants would not be negligible, and this could be a source of distress for the infants, further complicating the conducting of IRI studies. Even if easily wearable sensors were developed, they have to meet various technological requirements (e.g., reliability, robustness, availability, and quality of data), which are often very difficult to design. Therefore, although over the last decade, emotion recognition using physiological signals has gained its momentum, this approach is still hardly applicable in the IRI field. Indeed, for IRI applications it is important that these measurements do not interfere with the infants’ activity. For this reason, nonintrusive measurements are required, which can be performed without requiring additional cooperation from the infant or even without the subject’s awareness of the measurements. A suitable technique that well fits this requirement can be thermal IR imaging since it records physiological signals remotely. However, advancements in hardware and signal processing would be required for this technique to overcome the barriers related to its use with infants in real-life circumstances [92].

Another important aspect to consider when using sensors suitable for IRI is their camera angle. Indeed, IRI applications primarily require an ecological environment in which the child is free to move without the restriction of being within the camera angle of the sensors. The field of view (FOV) of an optical sensor (i.e., the maximum area of a sample that a camera can image) is related to the focal length of the lens and the sensor size. The sensor size is determined by both the number of pixels on the sensor, and the size of the pixels. Whereas there are visible cameras that can cover overhead camera angle, thermal imaging camera’s FOV values can range from 7° to a maximum of 80°. Although these values are suitable for a wide range of applications, in studies that require a higher FOV e.g., overhead camera angle, the combined use of multiple sensors is preferable.

3.2 Assessing the Main Research Areas of Affective Computing in IRI Applications (RQ2)

As robots move into more infant-centric environments, methods to develop robots that can spontaneously interact with infants are required. Robots must be capable of coordinated, timely behavior in response to social context and their interlocutor’s affective states in order to interact effectively with human users. This would require testing in the real world and addressing multidisciplinary challenges. Moreover, based on the analysis conducted on the papers reported in this section, to achieve a successful interaction with infants, robots need to pursue two main goals (1) to capture and (2) to maintain infants’ engagement. To this end, researchers have attempted to identify modes through which robots can recognize infants’ emotional states in order to socially and spontaneously interact with them. In fact, studies revealed that the success of robot acceptance lies in its capability to act as a social entity as well as its adaptability to differentiate behavior within appropriate response times and tasks [93]. At the same time, emotion recognition is a challenging task, especially when performed in a real-life IRI situation, where the scenario may differ significantly from the controlled environment in which most recognition experiments are conducted [94]. Based on the literature survey’s outcome, the IRI’s main applications that adopt emotion recognition techniques can be divided into two research fields: (1) the use of robots for infant rehabilitation or skill improvement, also known as assistive robotics for infants, and (2) the use of social robots for infant interaction, which aims to investigate infants’ perception of the robotic systems. Each of the two major topics is covered in the following sections and summarized in Table 1.

Table 1 List of robotic platforms, the metrics employed and their purposes of use in infant-robot interaction research

3.2.1 Assistive Robotics for Infants

Robotic technologies and especially assistive robotics for infants are a growing area of research and can prove to be fundamental for infants with all kinds of disabilities. Indeed, it has been found that infants are often attracted to robotic devices, and that such technology may enable children with physical disabilities to play and exercise, and facilitate learning in those who have cognitive challenges [95]. For instance, motion demonstrations from humanoid robots have several unique advantages for studying infant motion adaptation compared to classical techniques [96]. Since infants prefer face-to-face interactions [97], interactive humanoid robots may capture and maintain the attention of infants longer than inanimate toys do. Furthermore, small humanoid robots can produce motions similar to those of infants. This ability may help the robot to inspire infants to imitate desired patterns of motion. Such assumption was tested and proved effective by Kokkoni et al. Indeed, the authors developed a pediatric learning environment using two socially assistive robots aimed at delivering motor interventions. The results of the study revealed that the robots were able to facilitate and encourage mobility in young children through play-based interaction with the robot [98]. Fitter et al. demonstrated that infants 6 to 8-month old imitate robot motion and robot rewards motivate infants to move in particular ways [8]. This suggests the potential role of the robot to teach and reinforce infants’ motion. These findings are indeed based on past infant behavior research that highlighted imitation and contingency learning as two infant behaviors that could be exploited to encourage robot-based motor interventions [99]. Likewise, Pulido et al. showed that the robot was able to encourage the infant to reach higher acceleration from their movement to get better rewards from the robot [100]. This finding demonstrated that the robot’s physical embodiment and ability to provide various reward types helped it motivate infants and keep their attention for longer than other therapeutic tools. The infant-like size and humanoid anatomy of the robot also allowed it to fit in the infant’s visual field and demonstrate motions that an infant can imitate. Galloway et al. employed a mobile robot to provide the first experiences of self-generated long-distance mobility to infants with special needs. The authors proved that even without training, both typically developing infants and infants diagnosed with Down Syndrome were able to independently move themselves using a mobile robot [101]. Chen et al. furthered this result by training 26–34-month-old special needs infants, sitting on a mobile robot, to navigate and avoid obstacles [102,103,104]. These observations may provide new insights or give tips for future interventions on assistive robotics for infants. Moreover, researchers outlined social robots as a tool to support the diagnosis, treatment, and understanding of developmental disorders such as autism [105].

3.2.2 Social Robotics for Infants’ Interaction

This second research area in IRI focused on the nature of the interaction between infants and robots by posing different questions, such as: (i) How do infants perceive a robotic system, and (ii) how does this relate to the forms of contingent interaction they are able to undertake with the system [106, 107]. Demonstrating that the infants’ perception and categorization of a robot emerged and changed step by step depending on the form of interaction [106].The Infant’s attitude toward the robots appeared to change based on the robot interaction as well. In fact, Funke et al. demonstrated that infants were more likely to be alert and engaged when the robot pursued an active interaction compared to when the robot was not active [96]. Furthermore, the robot’s social-communicative interaction capability also plays a key role in mediating infants’ behavioral state and their perception of the robot. Indeed, Meltzoff et al. working with a group of 64 infants who were 18-month-old showed that it is not just the appearance of the robot, its physical feature, or even how it moves, but how it interacts with others and reacts that is important to the infants and drives infant perception of the robot [12]. These findings can have implications for the future design of humanoid robots and the field of social robotics in general. In addition, Michaud et al. designed a spherical robot, Roball, intending to study the impact of robotic interaction on infants, and investigating the potential role of the robot in contributing to the development of their language, affective, motor, intellectual and social skills [95, 108]. The robot was able to move autonomously and generate various interplay situations. Although the purposes and design of the study were interesting, the authors concluded that conducting trials with infants and robots is highly challenging. Inconclusive results can occur, especially because infants’ unpredictable mood can influence the interaction [108]. Thus, the need to further investigate the affective state of the infant while interacting with a robot. Finally, two studies that resulted from the literary survey were related to an innovative system called RAVE (Robot AVatar thermal Enhanced language learning tool). This is a dual-agent system that uses a virtual human and a physical robot to engage 6-to-12-month-old deaf infants in linguistic interactions (Fig. 3) [78, 109]. The tool was endowed with a perception system that could estimate infant attention and engagement through thermal IR imaging and eye-tracking. It was intended to be an augmentative learning tool to facilitate language learning, in particular, visual language during one widely recognized critical developmental period for language (ages 6–12 months [110]). To this end, thermal IR imaging was used to determine the infants’ emotional arousal and attentional valence during the interaction with artificial agents. In detail, authors differentiated five discrete values of the infants’ engagement: very negative (sustained decrease in attention), negative (non-sustained decrease in attention), very positive (sustained increase in attention), positive (non-sustained increase in attention), and a None signal which shows the signal’s absence because of not detecting a reliable signal from the baby [78]. The robotic platforms used for infant interaction purposes are listed in Table 1.

Fig. 3
figure 3

Experimental environment and setup employed in [78, 109]. (a) robot, (b) screen showing the avatar, (c) thermal IR camera placed in front of the baby through a slit in a black curtain and (d) the related thermogram

3.2.3 Metric used for Affective Computing

By reviewing the studies reported in Sect. 3.2.1 and 3.2.2, the metrics adopted to verify whether the robots have succeeded in their purpose to identify and elicit infant engagement can be summarized in three types of measures, related to both behavioral and physiological analysis. In detail, the emotion recognition based on behavioral analysis is represented by the eye gaze metric and infant imitation occurrence.

Concerning the eye gaze, the studies analyzed revealed that robot eye gaze has been successful in acquiring visual attention across various settings. Furthermore, most of the research in IRI has employed eye gaze and looking time as an index to evaluate infant’s engagement and attention to the robot. In addition, since infants may direct their attention more quickly to stimuli, they previously found interesting [111], this suggested that more complex or surprising stimuli may be more successful at acquiring and maintaining infant visual attention. Therefore, the experimental protocol of the analyzed studies often incorporates some sort of surprising action.

Relating to infant imitation occurrence, this is a usually employed metrics to evaluate robot performance. It mainly concerns the child’s ability to imitate the same action that the robot performs. In fact, since motion demonstrations from a humanoid robot have several unique advantages for studying infant motion adaptation, this is widely used in IRI. Accelerometer data, as well as motion analysis of the infant’s action, inferred through video analysis, are the metrics used to evaluate the action completion by the infant.

On the other end, the emotion recognition based on physiological signals in the IRI field is currently mostly related to facial skin temperature modulation. Such a metric is used to evaluate infants’ engagement toward the robot [78]. Thermal sensing has recently been used to identify distinct thermal patterns related to subtle changes in the infant’s internal state. For instance, the thermal feedback indicating an increased level of infants’ distress was found consistent with temperature decrease in peculiar regions of interest. Conversely increase in temperature was linked with infants’ interest and social engagement [109]. Specifically, in Gilani et al. to calculate information about the infant’s affective state, authors used the nose tip’s average temperature, extracted in real-time from each frame [78]. The study revealed for the first-time insights into infants’ psychophysiological responses to artificial agents such as robots. The metrics used in the analyzed studies are listed in Table 1.

4 Discussion

An intriguing challenge in the field of IRI is the possibility to provide robots with emotional intelligence in order to make the interaction more genuine, and spontaneous. A crucial aspect in achieving this is the robots’ capacity of inferring and interpreting infants’ emotions. Emotion recognition has been widely investigated in the broader fields of HRI and affective computing. This review reported on emotion recognition techniques designed for infants’ studies, with particular regard to the IRI context. Our aim was to review currently adopted emotional recognition and robot interaction modalities for the infant population and offer our point of view on future developments and critical issues. The following paragraphs summarize considerations on the research questions addressed in this study as well as provide suggestions for future improvement. Finally, ethical concerns of IRI are also discussed.

4.1 Discussions on the RQs Addressed

RQ1 (i.e., Examining the infants’ affective states recognition techniques and their relevance to IRI studies) helped us to identify relevant aspects of the affective computing techniques used in literature useful for IRI applications. In detail, the most common and natural way to observe and recognize emotions in infants is the analysis of facial expressions. Quantitative evaluation of observational data typically consists of manual coding, on a second-by-second basis, so that statistical techniques can be applied to data gathered from a test population. However, spontaneous robotic interaction would benefit from on-line emotion recognition, making automated methods preferrable. Automated approaches for infants’ facial expression identification are currently being developed. In particular, two of the studies reviewed in this paper used automated methods, with both achieving average accuracy of 80 and 81% in recognizing facial action units [88, 89]. Although such automated approaches have not yet been employed in IRI studies, their use is encouraged. By contrast, the contribution of thermal imaging in this area should not be underestimated. Indeed, it provides information about physiological parameters associated with the infant’s affective state in real-time and contactless.

From RQ2 (i.e., Assessing the main research areas of affective computing in IRI applications) is possible to highlight that the robots have been designed to interact with infants in a manner consistent with human psychology and following the guidelines and rules of social interaction. The study reviewed demonstrated that the interaction between robots and infants can prove highly effective in healthcare, especially in robot-assisted therapies. Besides, it has been shown that infants, have the ability to engage with robots and follow their gaze, based not much on the robot’s appearance but rather on its capacity to interact with others [12]. However, still little is known about the infants’ affective states during the interaction with a robot or a social agent.

The metrics used to detect infants’ engagement and affective states in IRI’s applications resulted to be based on behavioral indices such as eye gaze or motion analysis and on physiological cues such as skin temperature modulation which can be revealed by thermal IR imaging. Indeed, compared to other techniques for emotion recognition through physiological signals, thermal IR imaging has the advantage of collecting thermal signals remotely. Such peculiarity would enable it to be integrated into a wide range of IRI settings and applications. Conversely, regarding the behavioral index, i.e., the eye-gaze, it is a reliable and quick to process index, therefore suitable for real-time assessment. Moreover, such a metric can be easily integrated with other technologies used to recognize infants’ affective states. Combining different modalities would promote the future development of affective computing in the IRI field. Indeed, whereas existing literature tends to evaluate each of these algorithms in their own metric space the emotion recognition quality would benefit for global level markers. Table 2 summarizes the focal points deduced from both RQs as well as the suggested future directions.

Table 2 Overview of the insight gained and suggested future direction

5 Research Challenges and Open Problems

The present study aimed to provide new perspectives towards a successful interaction between infants and robots. In this regard, future vision will be to endow the robot with the capability of assessing the infant’s affective state based on real-time emotion recognition techniques. Indeed, affective state recognition capability would provide robots with a degree of “emotional intelligence” that would permit more meaningful and spontaneous IRI. However, some open problems were identified in the reviewed studies, which highlighted that emotion recognition in IRI is currently still a challenge for many robotic applications.

Those problems mostly apply to the real-world IRI scenario which is quite different from the laboratory setting. Previous studies used an experimental methodology and paradigm within which affective computing was performed without being embedded into a realistic IRI environment. While this approach could provide a baseline metric/value, the drawback is that it may not be representative of real-world IRI. The development of a reliable and accurate metrics to be used as a gold standard for affective computing in IRI could potentially allow overcoming such issues.

A second open problem is the accuracy achieved in the infants’ emotion recognition and the time it takes the affective algorithms to produce the emotional outcome. Real-world application indeed requires timely and accurate analysis. To increase accuracy, a multimodal information fusion system, rather than a single infant’s emotional states recognition technique, might be suggested as a future perspective. Indeed, advanced robots will benefit from integrating capabilities to detect and interpret infants’ emotions, motions, gestures, and sounds in order to accomplish tasks in synergy with infants or humans in general. Data fusion for visual tracking has already been used for robotic interactions with adults, involving the development of a real-time system for face/hand tracking and hand gesture identification within the particle filtering framework [112]. Moreover, combining data from infrared cameras with data from visible cameras, in order to integrate behavioral and physiological cues of the human interlocutor’s emotions was also proved effective in child-robot interaction applications [29]. A depth camera, in addition to infrared and visible cameras, can also be integrated into a multimodal data fusion system to improve the accuracy of motion and gestures analysis [113] and provide accurate cues for real-time infants’ emotions recognition by the robotic system.

In addition, when dealing with infants, a successful emotional-aware robot should be able to recognize different emotional states and be prepared to adjust its behavior based on the infant’s needs. As a result, it would be ideal, although challenging, to develop a database of the infants so that the robot can keep track of prior contacts and mood, as these aspects might influence social interactions with the infant. Moreover, a still open problem is the amount of infant data currently available for analysis. The database thus developed might potentially be used also to perform powerful machine learning algorithms that require a significant amount of data.

Finally, recognizing and classifying infant emotions may be a viable step toward synthetic reproduction of human emotions by the robotic system, which is a current fascinating challenge for most researchers in IRI and developmental robotic fields.

5.1 Future Perspective On the Developmental Robotics Field

Reliable infants’ emotion recognition technique would also greatly benefit the developmental robotics (DR) field. Indeed, the DR field aims to develop sensorimotor and cognitive capabilities in robots by drawing inspiration from child psychology and by modeling developmental changes [114]. Inspired by infants’ cognitive process, some researchers applied development theories into robotics. From these theories, researchers can understand the way infants build their structures of knowledge and develop their behaviors, language, and other complex skills [115]. Then let robots learn like human infants. DR is a rapidly growing research area having increasing interest, which constitutes an interdisciplinary approach to robotics, located at the intersection of developmental psychology and robotics [116]. It is, indeed, promoted by different driving forces: engineers seeking novel robotics advancements as well as neuroscientists interested in gaining new insight from trying to embed their models into robots [117]. Until now, some advances in neuroscience research have been successfully used for robotic development, such as sensorimotor architectures, motion control strategies, and behavioral models [118, 119]. Whereas, understanding and reproducing the mechanism of human learning, emotions, and curiosity is still an open debate [120]. At the same time, for advanced robot development, it is essential to make robots with anthropomorphic and diversified emotions in order to achieve efficient communication with the environment and humans [118]. To this end, endowing the robot with the capability of the on-line assessing of infants’ affective state could potentially pave the way for innovative applications in this field.

5.2 Ethical Consideration on IRI

Evaluating the ethical aspect of IRI is crucial when referring to sensitive populations such as infants. Robots for infants are likely to be designed so that their appearance, movements, and interactions encourage infants to form a relationship with them. The main concern in literature is whether this is a form of deception or whether is it ethically acceptable [121]. Since infants have a strong social drive and a lack of technology expertise, they are more likely to overestimate the capabilities of robots that have features comparable to those of humans or animals [122, 123]. Although this aspect can positively affect applications such as rehabilitation or pediatric care, it can become worrisome if the interaction between the child and the robot is long-lasting over-time. Indeed, there could be a risk that such overestimation by the infants themselves, might result in them spending too much time with robots. This could reduce the amount of time they spend in a sensible human caregiver company and hamper the development of their understanding of how to interact with friends or other human beings [124]. Similarly, a robot is not going to be an adequate replacement for a parent in terms of emotional relationships [125].

6 Conclusion

This review is intended to provide insight on the benefits and current research challenges of infants’ emotional state recognition methodologies in the IRI field. The robotic system’s ability to recognize infants’ emotions is the leading point for providing effective and spontaneous infant robotic interaction. Indeed, the studies reported in this review demonstrated that the interaction between robots and infants can be greatly useful in healthcare, especially in robot-assisted therapies [8, 100, 107]. Further highlighting the importance of investigating the robot’s influence on infants and the infant’s affective state during robotic interaction. According to the general guidelines derived from the studies reviewed, infant affective state recognition techniques are appropriate for use in the IRI field as long as they are able to provide real-time responses and the sensor used does not interfere with the infant’s activity or does not constitute a source of stress for him/her. In this perspective, contactless techniques should be favored. These include eye gaze, facial expression analysis, and thermal IR imaging. All the considered modalities represent promising information sources for future developments. Yet, infants’ emotions recognition remains a challenge for robots due to the need for accurate and timely outcomes. Future directions on this course include the employment of innovative and accessible technologies, such as depth cameras, and smart devices, along with advances in computer vision or machine learning that could lead to rapid developments of automated emotion recognition modalities and emotions-aware robots. Furthermore, the development of a robotic platform endowed with a multimodal emotion recognition system that integrates physiological signal analysis such as thermal imaging and visible domain analysis would make a significant contribution to science. To conclude, this review encourages and outlines guidelines for the use of affective computing in IRI applications as well as potentially provide support to future developments of emotion-aware robots.