1 Introduction

Autism Spectrum Disorder (ASD) is a developmental disability defined by the diagnostic criteria that include deficits in social communication and social interaction, and the presence of restricted and repetitive patterns of behaviour, interests, or activities that can persist throughout life (American Psychiatric Association, 2013a). Furthermore, individuals with ASD show difficulty in generalization, in transferring knowledge to a context other than the one in which the knowledge was acquired (Myles, 2013). Due to the diversity and specificity of symptoms, professionals have encountered some difficulties in developing effective support methods. In order to mitigate these individuals’ impairments and to improve the support process, in recent years research has been carried out focusing on the use of robots (Pérez-Vázquez et al., 2020), mechanical components or interfaces in the interaction with individuals with ASD. Individuals diagnosed with ASD have been shown to have a great affinity with technological devices (Dautenhahn & Werry, 2004; Mwangi et al., 2018; Tapus et al., 2012), including individuals who are not likely or willing to interact socially with their peers, parents, and professionals, among others. Moreover, improvements were observed in social behaviours such as imitation, eye gaze, and motor skills when interacting with such devices. Most research in the literature focuses on the use of social robots to interact with individuals with ASD, showing that robots produce a high level of encouragement and engagement in these persons. The physical appearance of the social robots used in the literature varies greatly (Catlin & Blamires, 2019), from four-wheeled mobile robots (Ferrari et al., 2009), and animal-like robots (Breazeal, 2000; Sosnowski et al., 2006), to humanoid robots (Costa et al., 2015a; Silva et al., 2017). More recently, researchers are using social robots with a humanoid design, as they allow for pronounced generalisation potential, particularly in tasks of imitation and emotion recognition (Begum, et al., 2015; Costa, 2014).

In addition to the use of social robots, Objects with Playware Technology (OPT) have been applied in support sessions for children with ASD. Lund, Dam Pedersen, and Beck (2009) define Playware as the use of intelligent hardware and software, which can be used to transfer knowledge while creating playful experiences. The use of OPT that are able to react to players’ actions in an appropriate way and consequently create a dynamic interaction between the user and other players can provide an exceptional occasion to quantify behaviour. There are a few works on the use of OPT with children with ASD. Lund, Dam Pedersen, and Beck (2009) conducted an experiment where interactive tiles were used as a modular robotic OPT with participants with ASD. These tiles were designed to be flexible in both set-up and activity building for the end-user. One of the activities played by children with ASD consisted in arranging the tiles according to their neighbour tile colour in order to produce new colours. There were three main tiles with the primary colours red, green and blue that never changed their colour. Then, other tiles were connected to the main ones. These tiles would get the same colour as the main one or a new colour would be shown if the tile was connected, for example, between two of the main tiles. The authors collected several variables such as the number of tiles used during the experiment, the average and maximum number of clusters (2 or more tiles assembled) created by the user, among others, and fed these features into a feed forward neural network in order to understand whether possible differences in criteria scores can be used to recognize any specific behaviour pattern—trying to recognize the individual child. Although the data was small, the network achieved an accuracy of 88% in recognizing a specific child’s play pattern, which led the authors to conclude that the results of the experiment with children with ASD offer an interesting new direction for research: investigating playware as play tools for cognitive challenged children, giving children a playful experience and investigating the playful interaction to provide insights (and possibly a diagnosis).

In general, games concerning the learning of social emotional skills (e.g. imitation and recognition of emotions) are played during supporting sessions with children with ASD in a Human–Robot-Interaction (HRI) scenario (Costa et al., 2015; Pennisi et al., 2016; Soares et al., 2019). However, in more traditional support sessions, professionals often use storyboards (Kokina & Kern, 2010) as one of the tools to help foster social skills in individuals with ASD. According to Gray and Garand (1993): “a social story describes a situation, skill, or concept in terms of relevant social cues, perspectives, and common responses in a specifically defined style and format”. Therefore, a social story can be, for example, a powerful tool to decrease inappropriate behaviour (Scattone et al., 2006). Thus, storyboards are usually focused on portraying one of the following topics (Social Stories & Comic Strip Conversations, 2021): daily living skills, unexpected events, transitions, adolescent skills, and social situations.

Most children with ASD have difficulty in processing auditory information due to alterations in the central auditory processing (Ocak et al., 2018). Thus, they have difficulty in verbal discrimination, reacting in a hyper-or-hypo-sensitive way to sounds. As vision can be one of the strongest skills of children with ASD (de Lima & Levy, 2015), a visual support can be fundamental to help children with ASD to understand requests and tasks. Therefore, storyboards are provided with visual cues.

Indeed, in general, the literature has some guidelines on how to build storyboards portraying a social story, often used as a traditional supporting tool by professionals (Kokina & Kern, 2010). However, there is little or no guidance on how to develop storyboards focused on training affective skills in an HRI scenario where the robot is the main character telling the story. One of the few works that developed and successfully evaluated (with a large sample) a storyboard tool during an HRI found significant differences in the learning of emotional skills in children who interacted with the robot compared to children who followed traditional methods of support (Soares et al., 2019). The work uses visual cues in the form of physical cards showing the story scenario while the robot tells the story. Additionally, another set of cards was given to a child for he/she to choose the correct affective state of the character at the end of the story. One of the limitations found in the study was the difference in attention span. This may be due to the many things that each child has to focus on at the same time, namely the robot, the physical card that shows the story scenario, and the cards with the answers.

The research team proposed a new initial approach of using both OPT and social robots with the goal of promoting social interaction with children with ASD (Silva et al., 2018). It consists of a hybrid approach where an OPT device is to be used as an add-on to an HRI with a robot that is capable of displaying facial expressions. To the authors’ knowledge there is no research considering the use of OPT as an add-on to social robots. With this setup, the OPT device will provide visual support cues during the interaction. The present work consists in defining the essential features of an OPT device, as well as the main characteristics of story images, and in describing how the interaction between a robot, the OPT and a child should be conducted.

A four-step study was conducted using both qualitative (a focus group and think-aloud sessions) and quantitative (an online questionnaire and a pilot study) research methodologies.

One of the contributions of the present work consists in presenting the necessary aspects to take into account when designing storyboards for an HRI scenario, namely by using the focus group method to promote discussion among specialists and, when possible, corroborate the findings with the literature. Additionally, by combining the outputs of the focus group and the think-aloud sessions, innovative insights will be produced about the design features of the OPT that will be used in the interaction process with a robot and a child, namely its format and modes of feedback, among others. Considering these insights, new story scenarios for the OPT device were developed and evaluated through an online questionnaire and a pilot study with children with ASD. The goal of the pilot study was to verify the main constraints of this new approach and evaluate the adequacy of the scenarios of a storytelling game during supporting sessions with children with ASD.

In addition to this Introduction, this paper is organized into four more sections: Sect. 2 presents the scope of the present work; Sect. 3 shows the methods employed; Sect. 4 discusses the results obtained; the conclusions and future work are addressed in Sect. 5.

2 Scope

The present work is part of a larger project whose main goal is to develop and use new interactive tools to support sessions with children with ASD (HomePage - Robótica Autismo, 2018). The current approach uses the humanoid robot ZECA (Zeno Engaging Children with Autism) that is able of displaying facial expressions (Fig. 1) due to several servomotors on its face and a special material, Frubber, which has a similar feel to human skin.

Fig. 1
figure 1

The different facial expressions displayed by the robot ZECA: a anger, b fear, c happiness, d surprise, and e sadness (Soares et al., 2019)

The main idea is to employ a hybrid approach that uses a robotic platform (Fig. 1) and OPT to promote social emotional skills among children with ASD (Silva et al., 2019). One of the serious games developed was the storytelling game, which consists of the robot telling a story and, at the end the child has to identify how the robot felt in that scenario of the story. Thus, it is important to define the main characteristics of this new OPT device, to design the image scenarios, and to do a pilot study of this solution to find the acceptance of the system by children with ASD.

3 Methods

With the present study, the research group aimed at evaluating the adequacy of a set of objects and scenarios used for emotion recognition in threefold activity (involving a robot, a child, and a researcher or therapist) with children with ASD. The scenarios under analysis were created and adapted by the research team according to a set of stories created by the same research team (Soares et al., 2019). Each scenario/story describes a situation alluding to a certain emotion, such as joy, sadness, fear, surprise or anger.

Different research techniques were considered to evaluate the scenarios and the role of the new OPT device together with the robot during the interaction process with a child. A four-step study (Fig. 2) was conducted. It included a focus group, think-aloud sessions, an online questionnaire, and a pilot study.

Fig. 2
figure 2

Four-step study consisting of: a focus group session, think-aloud sessions, an online questionnaire, and a pilot study

The focus group intends to explore and debate the opinions of the professional participants about the game scenarios, the OPT device, and the role of the robot. With the think-aloud, specialists must verbalize their thinking process. To safeguard the validity of the verbalizations, we took into account two guidelines: (1) use a neutral instruction to think aloud that did not request specific types of verbalizations; (2) practice the think-aloud during a session to allow participants to become familiar with verbalizing their thoughts. The participating experts in this study defined the appearance, interaction, and ergonomics of the final OPT prototype, as well as validated the final storytelling scenarios. The online questionnaire aims at validating the image scenarios that were developed following the guidelines of the focus group and the think-aloud sessions with the stories. Finally, the pilot study is used to detect the constraints of the system and to verify if it allows implementing a procedure that makes the children able to interact in a comfortable and natural way during a support session.

As regards the focus group, think-aloud sessions, and the online questionnaire, the participants in these activities or, in the case of children, their parents/tutors signed consents.

During the study, besides the informed consents signed by parents/tutors of the children who participated, all other ethical procedures involving these children were considered, namely the co-approval by the Subcommittee on Ethics for Life and Health Sciences (SECVS) of the University of Minho Ethics Committee (CEUM), and the school where the experiments took place. The following subsections detail how the different steps of the study were developed.

3.1 The Focus Group

Focus group is considered the most appropriate data collection technique to accomplish the research goals set for this study. This technique enables the researcher to analyse a topic in depth and emphasises the interaction between participants and their opinions (Amado, 2014; Stewart & Shamdasani, 2014). The goals were defined for the focus group study according to three main categories: the game scenarios, the OPT device, and the role of the robot. As for the scenarios, the goals were: (1) To analyse the amount of stimuli; (2) To analyse the spatial location of the elements of each scenario; (3) To analyse the adequacy of the colour palette, and (4) To define the physical appearance of the main character. Regarding the OPT device, the following goals were established: (1) To define possible forms of manipulation for the OPT; (2) To define the kind of feedback most suitable for children with ASD, and (3) To define the size and shape of the OPT device. Finally, it was defined what would be the robot role during the interaction.

Participants were intentionally selected to maximize the detail of knowledge about the phenomenon under study (Stake, 2010). A group of 10 participants integrated the focus group, having been selected because they are representative of professionals who support children, aged six to nine years, with ASD, in Portugal. Participants were selected according to the following intentionality criteria: professionals with different backgrounds in terms of training area (psychologists, occupational therapists, speech pathologists, teachers, and kindergarten teachers); who had at least five years of experience in supporting children with ASD; and who were both interested and available to participate in the focus group (cf. Table 1).

Table 1 Characteristics of the professionals

An interview script was prepared in order to establish a set of topics to be discussed in the focus group. Three types of questions were defined: (1) general questions, to introduce the study to the participants—its goals and dynamics, and to promote members acquaintance; (2) specific questions, directed to the three categories under analysis (scenarios, OPT device, and the robot role in the interaction), addressing topics such as the adequacy of the scenarios and the amount of stimuli, the adequacy of the spatial location of the distinct elements that integrate the scenarios, type and adequacy of the colour palette used in the scenarios, forms of manipulation, type of feedback, format of the new OPT, the role of the robot and how the interaction should be conducted; and (3) closing questions allowing a reflection on the main ideas and a space for additional thoughts to share (for instance, participants were asked about aspects they found relevant but not included in the discussion topics, and also about advice and suggestions they found relevant to share with the researchers).

A total of 15 scenarios portraying different emotions were projected for participants to analyse; each scenario contained two sample images. For each scenario, two different sample images were used as background. One of them, identified as A, is based on real-world environments. The general appearance of the other image, identified as B, is more cartoon-like. Figure 3 shows the two sample images of the scenario representing anger.

Fig. 3
figure 3

The two sample images of the scenario representing the emotion anger. In scenario A (left), the background and the characters are based on real-world environments. Scenario B (right) has a more cartoon-like appearance

After each projection, participants were asked to analyse the dimensions related to the scenarios. Questions related to the OPT device and the role of the robot were also prompted to the participants, following the goals of each of the three main categories in the focus group study.

After identifying the professionals who met the criteria for inclusion in the focus group, an email was sent with its goals, as well as details of the organization such as date, place, and time. A request for informed, clarified, and free consent containing the ethical procedures to be followed in the focus group session, as well as a statement of acceptance regarding the audio and video recording, were also sent to all participants. The focus group took place at the university where the authors of this paper work, in a room with adequate conditions in terms of comfort, light, and temperature. A moderator assisted by a co-facilitator who is trained in this kind of data collection technique, facilitating and ensuring that all topics were covered, conducted the focus group, which lasted 90 min.

Data were transcribed verbatim and crossed-referenced with notes taken throughout the focus group session. The analysis process comprised: (a) a pre-analysis stage, in which the data were observed, heard, and transcribed; (b) an exploration stage, in which the data were coded into previously defined analysis categories according to the study goals; and (c) an analysis stage, in which the thematic units were identified. The defined analysis categories were coherent with the goals of the three main categories (the scenarios, the OPT device, and the role of the robot): (1) amount of stimuli; (2) spatial location; (3) colour palette; (4) forms of manipulation and interaction; (5) type of feedback; (6) size and format of the OPT; and (7) physical appearance of the main character.

3.2 Think-Aloud Sessions

Three think-aloud sessions were conducted with two experts (A1 and A2) with backgrounds in early childhood special education, namely Autism Spectrum Disorder, and multimedia and product design (Table 2). The participants of the think-aloud sessions gave their consent to participate in this study. The sessions were conducted with the main goal of defining the appearance, interaction, and ergonomics of the final OPT prototype, as well as validating the final storytelling scenarios. In these sessions, the participants were presented with the sketches prepared for each scenario with the respective storyline following the main conclusions of the focus group study. The participants were invited to give their impressions of each image scenario in terms of (1) amount of stimuli; (2) spatial location of the elements of each scenario; (3) adequacy of the colour palette. While the professionals were thinking aloud, the moderator and the co-facilitator remained silent, thus avoiding interrupting their thought patterns.

Table 2 Think-aloud experts characteristics

The verbal reflections/responses of the participants were collected, and the data were analysed following the same three main categories previously defined. In addition, the designs of the OPT device were developed following the reflections made by the participants of the think-aloud sessions. This was an iterative process that culminated in the final design of the OPT device and scenarios.

3.3 The Online Questionnaire

An anonymous online questionnaire was administered to a total of 138 volunteer participants, 69 of whom were adults (20–67 years old, 69.6% female and 30.4% male) and the remaining typically developing children (7–11 years old, 56.5% female and 43.5% male) (Silva et al., 2020). The text and the image of each story, from a total of 15 stories, were randomly presented in the questionnaire. Participants had to read the story, observe the image, and select the emotion (anger, fear, happiness, sadness, or surprise) that corresponded to the story. The purpose of the online questionnaire was to validate the image scenarios that were developed following the guidelines of the focus group and the think- aloud sessions. The time required to complete the online questionnaire varied according to who answered, adult or child, but did not exceed 10 min on average.

3.4 The Pilot Study

A small pilot study was carried out in a school setting with 4 children with ASD to evaluate the approach obtained in the described four-step study.

The selected children were between 6 and 10 years old (M = 8.75; SD = 0.96). Although ASD is more prevalent in boys (Christensen et al., 2016), the selected sample has a gender-balanced representation). All children that participated in the study are verbal but their attempts to initiate interactions and make friends are difficult and typically unsuccessful (Level 1 of severity levels, DSM5, 2013). The participants are high functioning according to their diagnosis. A total of 3 sessions were performed per child.

The pilot study was conducted in a triadic setup, i.e., child-robot-researcher. The sessions were recorded from two points of view as depicted in Fig. 4.

Fig. 4
figure 4

The experimental design used during the experiments in a triadic configuration: child-robot-researcher

This layout was proposed to allow a basis for comparison between participants across sessions, since the experiments were conducted in an unconstrained environment (in this case a school) but familiar to the child.

The selected task was the storytelling game, in which robot ZECA tells randomly selected stories among 15 social stories and the participant has to identify the emotion of the main actor, i.e., the robot. The same scenarios that were designed with the input of the focus group and the think-aloud sessions were used in these stories. As the robot starts telling a story, an image representing the social context of the story is simultaneously shown as a visual cue. Then, the child is prompted to answer how the robot felt in that scenario. The child selects the answer by tilting back and forward the OPT device, scrolling through the facial expressions (common emoji) displayed by the OPT, and touching the image. At the same, when the answer is selected, a positive or negative reinforcement is presented by ZECA and the developed OPT. The type of reinforcement is configurable in both the robot and the OPT according to the child’s preferences. In this pilot study, the types of reinforcements used were the same for three children (robot: verbal + movement + sound; OPT: visual + haptic) and different for one child (robot: verbal + movement; OPT: visual + haptic).

The aim of the pilot study was to detect the constraints of the system and verify if it allows implementing a procedure that makes the children able to interact in a comfortable and natural way during a support session.

The videos of the experiments were coded in terms of frequency and time in order to quantify different behaviours. For example, the frequency and the time that the children looked at the OPT device was recorded. The number of times that children needed help and the number of wrong usages of the OPT were counted. Additionally, the number of correct, incorrect, and unanswered answers, together with the total number of robot prompts during the sessions, were recorded. The average response time of the children to the robot’s prompts was also registered. At the end of each session, the robot asked the participant if he/she wanted to play more. This information was also recorded.

The non-parametric Wilcoxon signed-rank test (alternative to parametric paired t-test) was used to compare children’s attention during the storytelling game scenario using children’s mean duration per gaze. This test is reported using the Z statistic.

4 Results

The following section discusses the results of the four-step study: the Focus Group, Think-Aloud Sessions, the Online Questionnaire, and the Pilot Study.

4.1 The Focus Group

The Focus Group analysis was organized according to its three study categories: the game scenarios, the OPT device, and the role of the robot.

Analysing the goals of the scenarios, the amount of stimuli in the scenarios, the focal group suggested decreasing the number of objects in the scenarios—“…therefore, the first aspect would be to have only what matters, in this story, which is Zeca, Alice, and the game…focus the attention of the child with autism spectrum disorder on the face of the robot and the emotion that the robot is going to have; … I think it would be better to have fewer objects …” (P5). Following the same idea, participant P10 proposed—“… remove most of the elements you may have”. Moreover, in all scenarios, the participants recommended directing the focus to the characters in the story and using a simple element to describe the scenes—“Yes, leave only one element that identifies the place … and focus the child's attention on the character” (P1). Thus, presenting only the materials needed to complete the activity and eliminating irrelevant and potentially distracting materials can be a good visual arrangement (Hampshire & Hourcade, 2016). Finally, the objects placed in the scene should be similar to those that children usually see and play with—“…I would use what kids usually play with…” (P9).

Regarding spatial location in the scenarios, the objects that identify the scenes should be placed in the background relative to the main character—“… put the elements in the background there or somewhere in the picture, showing that this is a playground because it has a slide…” (P10). Other participants also supported this claim—“This allows contextualization” (P9). Participant (P3) further stresses this idea—“Often this is the the difficulty here, this contextualization, but at the same time we do not want it to be the focus of the child's attention, but rather a global understanding of the situation.” This idea is supported by Happé and Frith (2006) where the author says that: “a person with ASD prefers a detail-oriented analysis and demonstrates great difficulty, or even inability to process global information”. Furthermore, children with ASD have the facility to segment into parts, but they struggle when they need to deal with a global organization of the material (Mammarella et al., 2014). Also, as the scenarios are static images, the participants suggested adding a sense of some movement to some of them to give the idea that something is happening. For example, in one of the scenarios there is a tower of blocks that the supporting character, Alice, knocks down, so it was suggested that some blocks fall—“You can have the tower and the top with her hand, some falling and others on the ground.” (P9) In another scenario, it was suggested to add a crowd of people appearing to walk—“Exactly, I would insert some movement, like people walking …” (P3).

Concerning the colour palette of the scenarios, the participants suggested the use of more vivid and contrasting colours, to accentuate the objects in the scenario and draw the child’s attention since colours usually attract the child—“… I would tweak the colour palette in the room, I think (Hã) overall it is all very lively and I think the wall and floor should have a more beaten colour and the toys should have brighter colours to help focus (Hã), on the event…” (P9). In the same sense—“… there has to be a colour palette that gives a contrast like that, to differentiate.” (P2). Moreover, the participants suggested not to use white as a background colour since they considered that it is not a warm/cosy colour—“ … the background wall may have a colour, not white, white will give an, an air … it is not cosy, …” (P9).

Regarding the appearance of the main character (ZECA), Fig. 5, participants agreed to use a more cartoon like avatar instead of the real image of the robot– “Throughout the various scenarios that we see (Hã) I identified, at the level of facial expression, I identified more with the, the avatar than with the robot, yes. That's why I think it is easier here to perceive the ZECA, but in cartoon form.” (P6); Participant P3 further reinforces this opinion—“It is because in the scenarios that we have been choosing, we have been defining, I think it fits more, this cartoon alike image”. In the same sense participant P9 complements this view—“It looks like it is part of the picture.” (P9). Moreover, the participants recommended that the facial expressions of the main story character, ZECA, should translate the emotion in the story, i.e., if in the story ZECA was angry, its facial expression in the image should reflect the same emotion—“…it would help if the robot showed some facial expression, in this scenario it could be an angry facial expression…” (P3). Regarding the other characters in the stories, they should display a neutral facial expression or the same expression as the main character, ZECA, when the feeling is mutual.

Fig. 5
figure 5

Physical appearance of the main character of the stories. On the left, it is shown a photo of the robot. On the right, it is a visual cartoon representation of the same robot

Regarding the design of the OPT and considering how the answer options should be prompted to the child, the participants suggested that only one option should appear at a time. Since the device has a touch-sensitive display and an Inertial Measurement Unit (IMU), the child could select the answer by touching the arrows, as shown in Fig. 6, or tilting the device back and forward to scroll through the options, promoting a more dynamic way of interaction—“That’s it, that’s part of dynamism, I still realize, the idea of sweeping is more intuitive yes, but this question is also dynamic … It is more elaborate, the gesture requires more competence, but it is possible to be even more motivating, more dynamic…” (P3). He further stresses this idea—“… I say again in terms of competence it requires more than just the sweep or the option to press the arrow, but it becomes perhaps more dynamic.” (P3).

Fig. 6
figure 6

The interface that is prompted to the child on the OPT where he/she has to select the correct answer by pressing the arrows in order to scroll through the different emojis that reflect the emotion presented in the storytelling scenario

The participants suggested that manipulation/interaction should be configurable according to the child’s capacity.

Regarding the overall size of the new OPT device, participants agreed with the use of a 5-inch display—“Yes” (all participants). In addition, one participant expressed concern that a bigger size of the screen could be difficult for the child to interact with the tilt gestures, becoming less comfortable—“… I realize that having a bigger screen could be better in visual terms, but in terms of interaction, too big can be difficult to operate…” (P3). This perspective is in line with the one advocated by P8—“I think they are so used to mobile phones, and all sorts of games, that they focus their attention on them, on devices like that, as the first one, and it's easier to handle something smaller”.

Considering the types of feedback of the OPT, it was suggested to the participants to discuss the option of using visual (through the display and multicolour LEDs), sound, and haptic feedback. First, the participants suggested the use of visual feedback, for example, by displaying an animation with fireworks—“Or like those little things coming out, like fireworks, that's right…” (P8). However, another element of the group proposed the use of simple feedback images, without animation—“I wouldn’t put animation, put good in green, try again in yellow or…” (P5). Other participants agreed with this idea—“Ah! Yes. I agree with P5, the positive and negative gesture I find it simpler.” (P4). The participant P8 supports the same idea—“It can also be”. Additionally, the inclusion of other stimulus such as haptic, sound, and the use of multicolour LEDs was discussed, however some participants expressed some concerns about the use of several forms of feedback—“It is too much confusion, a lot of stimuli, we already have the sound stimuli, the visual one and we will still have the haptics…” (P3).

On the other hand, other participants were more open to this idea—“Yeah, the game can have this repertoire. And I don't know, try it because you might have a child who enjoys it.” (P10). For children with ASD, learning via reinforcement feedback can be one of the most effective approaches to reinforcing desired behaviours because, throughout learning, an association is formed between a suggestion or action and a reinforcer with some intrinsic motivational value (Schuetze et al., 2017). Following this idea, the participants agreed with the use of all forms of feedback proposed, but suggested that they should be configurable to each child—“Yes you will have to configure it …” (P3). Therefore, the type of feedback must be configurable for different children as some types of feedback can be less enjoyable for some individuals, e.g., in the case of children with ASD, in general, sound feedback can be unpleasant for them (American Psychiatric Association, 2013b).

Considering the role of the robot, in particular, the interaction procedure between the child and the robot, the participants agreed that when the robot starts telling a story, the picture that represents that story should be displayed on the OPT, at the same time, and when the robot prompts the question asking what the emotion was, the options should appear—“The image of the story would disappear, I think so.” (P8). Participant P9 also shares the same opinion—“So the image would be there, but it would work the way you say, the answer options would only appear when you ask the question, after the question, and they would not appear before.” (P9).

Regarding the scenarios, in general, all participants suggested the development of new ones taking into account the aspects discussed during the focus group. Most participants preferred the cartoon scenario (marked as B in Fig. 3), but adding the cartoon looking robot as well as some changes to these scenarios taking into account the number and spatial locations of objects and the colours used—“Personally, I think that B has (Hã) a more striking set of colours than A, no doubt, at the same time it could also be, have more distracting elements” (P3).

Table 3 shows the group’s overall conclusion about the ideas discussed during the focus group.

Table 3 Conclusions of the focus group

Figure 7 illustrates the interaction procedure between the robot and the child agreed upon by the focus group participants: the child interacts with the robot and the experimenter’s role is to supervise and intervene when and if necessary.

Fig. 7
figure 7

The interaction diagram for the storytelling game scenario

4.2 Think-Aloud Sessions

The scenarios were developed following the main considerations of the focus group participants. Then, they were shown to the experts, who were asked to verbalize their thoughts about them. Figure 8 shows an example of two scenarios that were designed and validated following both the focus group and the experts’ input/directions.

Fig. 8
figure 8

Sample images of two storytelling game scenarios. The one on the left represents the emotion of anger. The one on the right represents surprise

The final design of the OPT (Fig. 9) was obtained following the main thoughts of the focus group and the specialists’ suggestions. It has a 5-inch display and an ergonomic design without any sharp edges, as suggested by the specialists.

Fig. 9
figure 9

Final design of the OPT device

4.3 Online Questionnaire Results

The information gathered with the online questionnaire allowed verifying the accuracy of the three scenarios per emotion. Figure 10 summarizes the results of the online questionnaire. The chart shows the average emotion matching accuracy (%) of the three scenarios per emotion for adults and children. For each emotion, there are three stories. Thus, for each emotion and target group (adult and children), the average of the matching accuracy (%) was computed. In general, the average emotion matching accuracy was 91.8% for adults, and 89.9% for typically developing children. The lower value obtained (70.1%) corresponds to the emotion Surprise for children.

Fig. 10
figure 10

Average emotion matching accuracy (%) for children and adults, per emotion

4.4 Pilot Study Results

A set of experiments was carried out in a school setting with four children with ASD who interacted with the robot in a triadic setup, playing the storytelling activity. Regarding the data obtained in the pilot study sessions, it was found that, in general, children gazed more the OPT device than the robot (Fig. 11A). In particular, the children’s mean gaze time in the storytelling game scenario was significantly shorter for gazes directed at the robot (M = 5.28; SD = 1.76) than for those directed at the OPT (M = 12.58; SD = 2.97), Z =  − 3.059, p < 0.001, Fig. 11B.

Fig. 11
figure 11

Number of gazes (A) and mean gaze time (B) towards the robot and the OPT for the 4 children (A, B, C, and D) during the sessions

The numbers corresponding to correct answers and wrong answers are shown in Fig. 12. All participants answered the robot’s prompts and, in general, there was a positive evolution regarding the number of successful answers throughout the sessions. On average, the children’s response time to the robot’s prompts was the same: around 52 s (Fig. 13A). The amount of time the children needed help was relatively low across all sessions, Fig. 13B (tending to no need of help throughout the sessions).

Fig. 12
figure 12

Answers of the children to the robot’s prompts during the sessions

Fig. 13
figure 13

Top: children’s mean response time in seconds (with confidence interval) to the robot’s prompts during the four sessions. Bottom: number of times children needed help during the sessions

5 Conclusions and Future Work

ASD is a developmental disorder defined by diagnostic criteria that characterizes individuals who have deficits in social communication, interaction, and generalization. It is known that these individuals usually show interest in technological devices. Therefore, research has been conducted using technological devices with different forms and designs with this target group. Some recent works using social robots and OPT with children with ASD demonstrate that robots produce a high level of encouragement and engagement in these individuals.

However, to the authors’ knowledge, there is no research considering the use of OPT as an add-on to the social robot. Thus, the research group proposed the new approach of using both OPT and social robots with the goal of promoting social communication with children with ASD (Silva et al., 2018). The developed work consists of defining the essential features of the new OPT device (including size and forms of interaction, among others) and evaluating the adequacy of the scenarios of a storytelling game, using qualitative and quantitative research methodologies in a four-step study—a Focus Group, Think-Aloud Sessions, one Online Questionnaire, and a Pilot Study.

Regarding the scenarios, in general, all participants in the focus group suggested the development of new ones taking into account the proposed aspects. Most participants preferred the cartoon scenario (marked as B in Fig. 3), but adding the robot with a cartoon appearance and changing the number and spatial locations of objects and the colour palette used.

Concerning the new OPT device, participants agreed that the device should display different forms of feedback adapted for each child, i.e., it should be personalised. The members of the focus group also agreed that manipulation and interaction, i.e., the use of gestures or arrows on the screen to select the answer, should be configurable according to the child’s interests and skills. Finally, the group strongly agreed that a 5-inch touch display would be the most adequate to be used on the device.

The final designs of the OPT and the scenarios were obtained by conducting think- aloud sessions, following the feedback of the specialists and having in mind the results of the focus group. Finally, an online survey was conducted with 138 participants (children and adults). The average emotion matching accuracy was 91.8% for adults and 89.9% for typically developing children. Thus, in general, both children and adults correctly identified the emotion of each social story (Silva et al., 2020).

Following these results, a pilot study was conducted at the school with which the university has a cooperation protocol. Due to constraints such as the inherent characteristics of a population of children with ASD and inclusion criteria, 4 children with ASD participated in the pilot study. The results indicate a positive trend since the children understood the game mechanics and successfully interacted with OPT. Furthermore, children’s attention was directed to both the robot and the OPT, indicating that both components were successful in captivating their interest. In general, the mean time children gazed towards the OPT was longer, suggesting that children used/relied the OPT during this activity. The OPT displayed the visual cues of the story told by the robot. Moreover, in general, the children showed interest in participating in the activities, as they wanted to continue playing. They were also attentive to the feedback from both the robot and the OPT: lights, haptic, as well as the images for correct and incorrect answers displayed on the OPT screen and the facial cues, gestures, sounds, and verbal prompts of the robot (Silva et al., 2021).

The use of the OPT eliminates the need to have several visual cards, thus making the interaction seamless, as the interaction is dynamic. Additionally, the OPT device might offer a unique way to quantify a child’s behaviour without being too invasive (as it would be, for example, the use of wearable sensors) and further feed the system so that the robot “knows” the level of engagement of the child. The evaluation results seem encouraging and suggest that this hybrid approach may have the potential to effectively support the teaching–learning process of emotional skills. From the point of view of pedagogical practice, it can be used as a complement to learning and, potentially, as an add-on to traditional approaches, diversifying support practices. It has been shown that robots can effectively engage children with ASD (Dautenhahn & Werry, 2004; Mwangi et al., 2018; Tapus et al., 2012). Thus, the present approach may offer a way for creating learning opportunities that involve the training of competences, for example via repetition, implying greater levels of attention and involvement in the task while providing adequate reinforcement, (Scassellati, et al., 2018; Suzuki, 2014). At the same time, it can promote higher levels of motivation (Dautenhahn & Werry, 2004; Mwangi et al., 2018; Tapus et al., 2012) and the child’s participation in the development of emotion recognition skills.

The different theories explaining ASD account for the inter-individual differences of children with ASD, justifying the need for more eclectic tools that focus on the characteristics of the particular child rather than the general characteristics of the disorder (Lombardo et al., 2010). By using social robots with specific capabilities, such as synthesis and recognition of facial expression, therapists can provide more precise and individualized support to children with ASD (Diehl et al., 2012). In fact, this inter-individual differentiation of the child with ASD has methodological implications, namely the need to structure and control the support environment, aspects that assistive technologies can promote and safeguard, allowing therapists to observe and analyse interactions and learning behaviours. These data can then be used to adjust support sessions and create an individualized learning curriculum (Wainer et al., 2014).

The proposed 4-step study methodology (focus group, think-aloud, online questionnaire, and pilot study) was used as an initial step to assess the adequacy of the proposed innovative hybrid approach of using a robot and an OPT device to promote interaction and emotion recognition skills in children with ASD. Moreover, the methodology provided implications on how to design storyboards for an HRI scenario. The sample of the pilot study allowed testing the constraints of the experimental design as intended. Nevertheless, as future work and with the objective of assessing further constraints, making conclusions, and inferring results more experiments will be conducted, with the increase of the sample size leading to more representative sample and results of this target group.

The process of designing tangible interfaces requires building a bridge between the physical and digital worlds, which presents both design and technological challenges. Furthermore, it is important to take into account the nature of the target group. The use of qualitative and quantitative research methodologies can be useful when designing new interaction tools. By making use of these research methodologies, it was possible to explore different technological tools in order to develop an innovative way to promote the process of supporting children with ASD.