1 Introduction

Touchless gestural interaction has been widely studied during the last decade, as one of the most promising solutions for allowing interaction with displays of various sizes [3]. In particular, prior work has investigated the use of such paradigms in public spaces, especially in the context of large public displays [6, 10, 37]. The most common issues in this context are interaction blindness, i.e., the inability of the users to guess that a display is interactive [28], and affordance blindness, that is, the inability to understand how to interact with it [9]. The latter is particularly relevant for touchless-enabled displays that are often mistaken for the more common touch-based ones.

In such contexts, visual interfaces play a key role both before and during the actual interaction. Indeed, an appropriate interface could strongly contribute to addressing both interaction blindness and affordance blindness, and could make the interaction itself more intuitive and straightforward. Many prior works have suggested the use of avatar-based interfaces, where a predominant human-shaped entity continuously reproduces user movements [19, 26, 39, 40]. These studies revealed the effectiveness of silhouettes, mirrored images, or avatars in communicating the supported interactivity and its touchless nature. Considering such advantages of avatar-based interfaces in supporting adults’ interaction, we aim to understand whether similar effects happen also in child–display interaction. In fact, prior work showed how touchless gestural interfaces may facilitate learning in children [1, 29], and it is not a case that such interfaces have been widely adopted in serious games [5, 14]. For these reasons, in our research we investigate the multiple facets of child–display touchless gestural interaction mediated by an avatar-based interface.

In this paper, we present the outcomes of our research that are relevant in order to identify some useful guidelines for improving the effectiveness of avatar-based touchless gestural interfaces in large displays for children. In particular, we investigate some of the design issues that could have an impact on the children’s experiences at different ages, in terms of engagement and enjoyment, effectiveness in recalling the content, and style of the interaction with the avatar-based interface.

The rest of the paper is structured as follows: Section 2 provides an overview of related work in the fields of touchless gestural interfaces for pervasive displays and children–computer interaction; Section 3 describes the technical apparatus and how we used it in order to conduct the study; Section 4 summarizes the results; Section 5 provides the main lessons learned after analyzing the results; section 6 concludes the paper, providing informed insights for future work.

2 Related work

This work builds upon prior HCI work, within research areas such as touchless gestural interfaces for pervasive displays and child–computer interaction. This section provides an overview of the related work that guided our research.

2.1 Gestural interfaces and the use of avatars

In the context of pervasive displays research, many touchless gestural interfaces have been proposed and implemented. They have been used in order to interact with 3D virtual objects [7], to access information provision systems [8, 19], to create and support playful interactions [26], and in many other applications. The use of touchless gestures, especially when applied to public displays, has many advantages. Among them, touchless gestural interaction limits vandalism by placing the display in unreachable places [37], keeping a high hygiene level of the screen surface [20], and removing constraints on display size (see for instance works on wall-sized displays [6] or media façades [10]).

Walter et al. [40] focused their work on describing existing solutions, found in the literature, for user representation in touchless gestural applications. They categorized this body of knowledge according to three possible options: using only hand-shaped cursors, using avatars, and using the user’s silhouette. Recently, other works have focused on the use of silhouettes or avatars [19, 26], since they have proved to be very effective in solving some common pervasive display issues, namely interaction blindness (i.e., the inability of users to recognize the interactive capabilities of a display [28]) and affordance blindness (i.e., the inability to understand the interaction modality of the display [9]). Gentile et al. also showed that the presence of an avatar makes two-handed interactions more “natural” in the sense that it contributes to a reduction of the cognitive workload while interacting with public displays [18]. However, to the best of our knowledge, prior work has not focused on the use of avatar-based solutions by a specific class of users, such as children. For instance, Müller et al. showed that interactivity can be recognized after less than 3 s using avatars [26]; however, their work considers users of various ages, without an in-depth analysis of children’s behaviors. Similar limitations can be found in [9, 18, 19, 39], where the avatars and silhouettes’ capability for attracting users and improving their experience have not been evaluated with a special focus on children. In this paper, we aim to fill this gap by studying how children interact with avatar-based interfaces.

2.2 Gesture-based interfaces for children

Literature on avatar-based interaction has shown how this paradigm could be beneficial in supporting interaction with large displays for adults. In the context of child–computer interaction, other facets of touchless gestural interaction have been investigated, mainly in order to understand the effect of this novel paradigm in learning activities [1, 5, 29], but also for gaming purposes [4, 5, 13]. Other studies are more concerned with understanding how to design touchscreen interactions [36].

Several applications described in the literature use users’ silhouettes [15, 43], but some authors have opted for both “stick-man”-shaped avatars (e.g., Tweetris [13]), or more customizable avatars [4, 5].

Adachi et al. showed that full-body interaction promotes a sense of immersion in children [1]. Bailey et al. also showed that the customizability of avatars might make the gameplay experience more enjoyable [4]. Bartoli et al. showed that motion-based touchless games may have a positive impact in improving the learning capabilities of autistic children [5].

The ability of facilitating learning has been described more generally as an effect of engaging interactions [2, 16]. Moreover, the relation between enjoyment and engagement of natural user interfaces interaction is well known, as shown for instance in [24, 33, 42].

Although there is extensive literature on touchless gestural interaction with children, to the best of our knowledge, avatar-based interfaces have not been thoroughly studied with children in terms of challenges relevant to the pervasive display community, such as affordance blindness, or two-handed interaction.

2.3 Children’s cognitive development

In our study, we focused on participants aged from 2 to 10 years old. Within this age range, a number of cognitive abilities are significantly developed such as, executive functioning [23], visual and spatial perspective-taking (i.e., the ability to perceive a situation from another’s point of view [12]), counterfactual thinking [34], and theory of mind (i.e., the ability to understand other people’s mental and emotional states [30]). For example, 3- and 4-year-old children often experience difficulties adopting the perspective of others in perceptually based tasks [32] and communication tasks [25]. This is particularly relevant in the case of interfaces based on avatars where children have to recognize the avatar as a representation of themselves.

According to Piaget’s four stages of cognitive development [31], children are in their pre-operational (2–7 years old) and concrete operational (7–11 years old) stages. These two phases are characterized by different abilities and needs. Thus, in the pre-operational stage, children are still in their egocentric phase in terms of their ability to communicate, and they have difficulty in taking the perspective of other people (children and adults), including their emotions. In the concrete operational stage that follows this, children start thinking logically about concrete events and they become more logical and organized but remain very concrete. Children’s egocentrism tends to disappear, however they struggle with abstract concepts.

Therefore, the use of a developmental perspective is necessary to better understand how the progression of cognitive abilities in the early and middle stages of child development influences the style of child–avatar interaction. This information is important because it could facilitate the creation of innovative and effective child–display interactive design approaches.

3 Study description

The study was run during a summer public engagement eventFootnote 1 organized by the University of Lincoln, UK. The spirit of the event was to introduce children to science and to stimulate their curiosity as well as being an opportunity to showcase research outputs to families in the local community. Thus, for the university, it was a public engagement action as well as a way for scientists to collect data from real users. During the event, children and parents were invited to play with different research showcases. All were informed about the specific purpose of each study and were aware of the researchers collecting the data. Researchers had to apply for an ethical consent form before the event, and inform each participant (and the parents) of the study. The event lasted 5 days, with eight different sessions and a total of 24 showcases. Each showcase was installed in a separate room in order to allow researchers to collect data properly and to allow participants to have their own space. Children who attended the event could decide to participate in one or more studies.

In this context, we conducted our study with the aim of investigating some of the design issues that have an impact on improving children’s experiences. In particular, the main purposes of this exploratory study are:

  • To study whether an avatar (movable or immovable) provides interactions that are intuitive for children and therefore help to overcome affordance blindness

  • To study whether an avatar-based touchless interface makes children’s experiences engaging and enjoyable therefore improving recall of content provided through the interaction (learning about art)

3.1 The interactive art jigsaw

In order to explore our main research questions, we designed an Interactive Art Jigsaw. By using mid-air gestures, the user has to complete two different jigsaws. The first one is initially filled with all but one piece (see Fig. 1). The first piece is placed randomly on the left or right side of the jigsaw. When the user completes this first jigsaw, a smiling face is shown to confirm the completion of it. Then, when the researcher presses a key on the keyboard, a second jigsaw is shown, where all six pieces are arranged at the sides (see Fig. 2). Right after the user correctly completes the jigsaw, a video is automatically played providing additional textual and audio information about the painting on the jigsaw (author’s age, name, origins, and where the painting is exhibited).

Fig. 1
figure 1

The first puzzle consists of completing a jigsaw that is all filled except for the last piece. This first task was used for the initial training phase

Fig. 2
figure 2

The second jigsaw, shown right after the completion of the previous one

We choose these paintings because they were made by children and exhibited in a children’s museum. Moreover, the paintings’ content is suited to our children’s age and the artwork is aesthetically pleasant.

The first jigsaw was intended to serve as an initial training phase for children, in order to let them understand how to interact with it (see Section 3.5). Then, the second jigsaw can be completed after they have learned how to use the interface to solve the jigsaw.

The interaction with the jigsaw pieces was based on a virtual avatar shown in the middle of the screen, which continuously replays a user’s movements. By driving one of the hands of the avatar on top of a tile, the user could then close her hand into a fist in order to initiate the dragging of the piece, which continues until the user opens her hand again. As explained in Section 2.1, the presence of the avatar allows the interactions to be more natural [18] and should facilitate users in understanding how to interact with the system (i.e., addressing affordance blindness) [26].

3.2 Interaction modalities

Considering the broad age range and abilities of our participants, we were aware that it would be quite challenging to design an interaction that could be stimulating and at the same time easy to use for such an age range. The design of the Interactive Art Jigsaw was based on a previous deployment [17] and adapted according to the specific needs of our users (children). In particular, we designed the system in order to implement two different interaction modalities, the two conditions that we compared in our study were:

  • immovable, i.e., the avatar is always shown in the middle of the screen, regardless of the user’s relative position: if the user moves to the left or to the right, the avatar remains always in the center, replaying only the arms and hand gestures;

  • movable, i.e., the avatar can be moved horizontally, replaying also the user’s body movements to the left and/or to the right.

Prior work used a movable avatar in order to better reproduce users’ movements, which in turn should help in communicating touchless interactivity [26]. On the other hand, Gentile et al. [19] showed that an immovable avatar allows the avoidance of some typical issues of touchless gestural interaction (e.g., the live-mic problem [41]).

However, to the best of our knowledge, how those two modalities affect children–display interaction has not been studied before.

3.3 Technical apparatus

The Interactive Art Jigsaw system used for our study consisted of a 55” LCD display placed at eye height, connected to a computer with a i5-6500 Quad Core Processor @ 3.20GHz, 500GB HDD capacity, 16GB RAM and Nvidia GeForce GTX 970 graphics card. A Kinect for Xbox One was placed above the display and connected to the PC, in order to gather information on users’ body gestures, using the Microsoft Kinect SDK v2.

In order to allow users to interact within the Kinect field of view, we arranged tables around it to constrain interactions within 3 m of the camera. Figure 3 depicts the setup.

Fig. 3
figure 3

Experimental setup: display, Kinect and tables

We also conducted several trials with children of different ages (ranging from 3 to 10 years old) in order to verify the compliance of this apparatus with the limited heights of our users, especially with regards to recognition capabilities. Based on these preliminary observations, we decided to set the display height at 80 cm from the ground. The appropriateness of this choice has been confirmed by subsequent tests, since no recognition issues were detected during our study.

3.4 Participant selection and recruitment

The event hosted 220 children (F = 102, M = 118) aged from two to ten (2 = 1%, 3 = 11%, 4 = 9%, 5 = 14%, 6 = 16%, 7 = 15%, 8 = 10%, 9 = 15%, 10 = 8%). In 5 days, we ran eight sessions with an average of 27.5 participants each. Participants arrived at the beginning of the session and played with other children and parents in a common room. A team composed of two researchers had to recruit participants by randomly selecting them from the crowd. The participants were involved on a voluntary basis. The researcher contacted the child, explained the study to him/her and their parents, and asked them to participate. If s/he agreed, they moved into the room with the installation; if not, the researcher acknowledged this and asked someone else. Prior to the study, a faculty ethical approval was obtained. The event organizers informed parents and obtained their consent. Children could decide to withdraw any moment. Researchers acted as facilitators and made sure children did not feel under pressure but were comfortable with and enjoyed the activity.

A total of 107 children (F = 54, M = 53) played with our Interactive Art Jigsaw. The age groups were quite evenly distributed: 2 = 0.9%, 3 = 6.5%, 4 = 8.4%, 5 = 15.0%, 6 = 16.8%, 7 = 15.9%, 8 = 15.0%, 9 = 16.8%, 10 = 4.7%. In each session, an average of 13.4 children participated.

3.5 Procedure

At the beginning of each test, children were invited one at a time to enter a room where our system was deployed. In a few cases, we involved parents in order to make the child feel comfortable. Then, the child received instruction about the study sessions, each of which consisted of completing a jigsaw: in the first one, that we used as a training session, they had to figure out how to interact with the system. To this end, they were requested to place one missing piece, out of six, in the right position (see Fig. 1). In the subsequent session, referred to from here on as task session, a second jigsaw was shown, and the children had to place all six pieces in the right positions to complete it (see Fig. 2).

The first session was intended to serve as a training session in order to let the children guess and understand how to use the interface, which means how to grab and move the missing piece. In this session, the team did not provide any instruction on how to use the interface. They only explained the final goal to the child without mentioning or unveiling any aspect of the interaction modality. The team monitored the child and if researchers noticed that the child showed signs of stress because s/he was unable to understand how to interact with the display, an experimenter would give a suggestion every 30 s, such as:

  1. 1.

    “step back”: the experimenter used this suggestion in case the child tried to touch the display, the proximity to the display did not allow the avatar to be visible;

  2. 2.

    “try to move your arms and hands”: the experimenter used this suggestion if the child noticed the avatar, but missed the interactivity;

  3. 3.

    “grab a piece”: the experimenter explained to the child how to grab a piece, i.e., close the hand into a fist after having driven the avatar’s hand on top of the piece;

  4. 4.

    “mimic”: the experimenter gave explicit instruction on how to use the child’s body by enacting the interaction and asking the child to mimic her behavior.

At the end of the training session, if the child needed at least one suggestion, the experimenter asked them to perform the task again in order to make sure that the child understood how to properly interact with the interface.

The second session was intended to assess the children’s experience in terms of engagement, enjoyment and learning. To this end, a second jigsaw was shown, and the child was asked to complete it. At the end of this second session, the child was asked to watch a video. The video concerned the painting in the last jigsaw and provides information about both the author and the artwork. These were delivered as audio and also text (subtitles) to make sure to include deaf children if necessary. Indeed, we were not exactly aware about the user’s sample before conducting the study (which was run during a public event). However, in the end we had no deaf users during our study.

After the completion of the sessions, participants were asked to conduct a semi-structured interview, assessing their experience in making the jigsaw.

3.6 Methods, data collection, and analysis

Data were collected from different sources: demographic information from the organizers of the event, notes taken by the two researchers during the task execution, and a semi-structured interview. To analyze the data collected, we adopted a mixed approach by merging qualitative and quantitative methods.

The event organizers provided anonymous demographic information collected from each child. All the other data were collected by the two researchers. We discarded all the data of children that dropout the study.

During both the training and the task sessions, the two researchers took notes of the duration of the interaction and other notable events. In terms of quantitative data, we collected the number and variety of suggestions given to the users during the first task, as well as the times required for completing both the sessions.

We also collected qualitative data based on observations, particularly focusing on: the use of one or two hands during the interactions, if the users tried to touch the display, occasional parental intervention (e.g., giving significant instructions or help), and other relevant information observed.

The semi-structured interviews aimed at allowing children to self-report their experience. Questions were asked by a researcher with extensive experience in child–computer interaction. Children could ask explanations and the researcher provided more details on those aspects that were more complex or that contained words that might be difficult to understand e.g. “gesture-based game”. In addition, in order to keep the child focused, we opted for a very short interview:

  1. Q1

    How much did you enjoy making the jigsaw?

  2. Q2

    Would you recommend it to other children?

  3. Q3

    What did you like about making the jigsaw?

  4. Q4

    What did you not like about making the jigsaw?

  5. Q5

    Have you ever made a jigsaw?

  6. Q6

    Have you ever used a gesture-controlled game (e.g., Kinect or Wii controlled games)?

  7. Q7

    Do you remember something about the last painting/image? If yes, what?

  8. Q8

    Do you remember something about the author? If yes, what?

We did not ask explicit questions on touch screen interaction experience because we wanted to keep the interview short and it was not relevant to the main research scope. In Q1 and Q2, we asked children to give a score from 1 (not at all) to 5 (very much). The scale was associated with emoticons, i.e., using a so-called smile-o-meter (based on [21]). Q5 and Q6 required a Yes/No answer. All the other questions were open, and researchers took notes reporting literally the children’s answers. Answers Q7 and Q8 were thus coded by the two researchers, who assigned a value ranging from 0 (i.e., the child does not remember) to 3 (i.e., the child remembers at least three different details of the video displayed at the end of the second activity or the painting). The resulting values served as two learning indexes, providing a quantitative indication of how much children recalled about the content. We computed Cohen’s kappa coefficients for evaluating inter-rater agreement for coded answers to Q7 and Q8. Results indicated substantial agreement for Q7 (κ = 0.6547) and Q8 (κ = 0.8566) [22].

Task duration and children’s quantitative answers were analyzed separately using a quantitative approach. Children’s answers to questions Q3 and Q4 were transcribed and coded by the two researchers separately, in an inductive and deductive way. The main themes developed from the answers were organized with the other data to provide an overview of our findings and to answer our research questions.

In the data analysis phase, we have generally grouped children by age according to their scholarly level: 2–4, 5–7, and 8–10. This choice is supported by literature on the ability to operate with interactive systems as well as ability in recalling the information [23]. In other cases, in presenting our results we grouped children in two groups (2–4 and 5–10 years) with the purpose to better highlight the difference due to the impact of schooling. Children aged 5 to 10 years are facilitated in recalling the video information (images, audio, and text) since they can read and write.

3.7 Threats to validity

3.7.1 Internal validity

It should be noted that two researchers were present during the study with children and might have introduced the classic “experimenter bias” by inadvertently helping children when those were lost. To prevent such limitations, a strict protocol was put in place by introducing timely instructions replicated for all participants as described in Section 3.5. Nevertheless, we wanted the experience for children to be memorable and positive, so some extra help might have been inadvertently provided although, when help was given, it was always reported in the data collection.

3.7.2 External validity

The results of the study can be partially generalized since the university environment, a classroom with an interactive display resembling a digital whiteboard, could represent a common setting in primary schools. Nevertheless, by running the study during a public event, some bias towards positive participation and engagement could have been introduced. For instance, some parents were present, encouraging their children, and furthermore, by volunteering to participate in this study, children’s attitudes might have been skewed towards learning and engagement from the start. We reported those few cases where parents were present during our study but generally, children, especially of older age, were not accompanied by parents.

3.7.3 Construct validity

We assumed that children would understand the drag-and-drop action as a form of tile manipulation with their hands but that could have eased the interaction depending on children’s previous experience with touchless technologies, i.e., Nintendo Wii or Microsoft Xbox consoles. We carefully recorded such data in the demographics to make sure to consider such effects when collecting evidence and on conducting data analysis. This was an approximation and we might have missed other important factors, e.g., hand coordination issues caused by the drag-and-drop gesture, especially in younger children.

4 Results

The goal of this study is twofold. To establish:

  • To study whether an avatar (movable or immovable) provides interactions that are intuitive for children and therefore help to overcome affordance blindness

  • To study whether an avatar-based touchless interface makes children’s experiences engaging and enjoyable, and therefore improving recall of content provided through the interaction (learning about art)

In this section, we present the results by merging data from quantitative and qualitative analysis to provide evidences in accordance with the goals of this study.

4.1 Children interacting with the avatar

Children interacted with our system for 82.80s on average (st. dev. 62.49s) during the training session, and for 80.41s on average (st. dev. 57.22s) during the subsequent task session. Figure 4 shows the duration histogram for both the training (left) and task (right) sessions. As expected, the training duration showed a higher variability due to lack of knowledge about the interaction modality by most of the children. This training session allowed for reduction of variability in the task session, allowing the users to understand how to properly interact with the system, and this is proven by the more-narrow histogram.

Fig. 4
figure 4

Histograms of training and task duration.

4.1.1 One-handed vs. two-handed interactions

During our observations, we noted that some children preferred to use only one hand, while others opted for using both hands (see Fig. 5). In particular, 41% interacted with only one hand, while 59% used both hands.

Fig. 5
figure 5

On the left, an example of a child trying to grab and drag a piece using only one hand. On the right, another children try to use both arms to interact with the avatar

Table 1 shows that the moving avatar allowed completion of the task in a lower time (71.41s). In addition, the best performance in terms of task duration was achieved by using two hands with the immovable avatar (79.45s) and using one hand with the moving one (58.73s). We used factorial ANOVA to see if there was a significant effect on the task duration due to avatar state (movable or not) and the number of hands. Results indicate that the overall model is statistically significant (F(3,94) = 4.43, p < 0.01). In particular, the state of the avatar (movable or not) is also statistically significant (F(1,94) = 4.25, p < 0.01), as well as the interaction between the avatar state and the number of hands (F(1,94) = 10.84, p < 0.01).

These findings show that using a moving avatar allows children to stick to a single hand, which lets them interact more effectively with the avatar. This is in line with previous work by Walter et al. [40], who showed that adults usually tend to stick to one hand while interacting with large displays. Using an immovable avatar does not allow the child to interact effectively with the interface, unless users decide to use both hands.

We therefore analyzed the preferences in terms of number of hands used. Data showed that the majority of the younger and older children preferred to use two hands, while children aged 5–7 used one hand slightly more than two hands (see Table 2). A Chi-square test confirmed that age has a statistically significant effect on the use of both hands during the interactions (χ2(2) = 6.8914, p < 0.05).

Analyzing the differences between the two conditions (Table 3), we found that the youngest used mainly two hands in both avatar states (movable vs. immovable). Conversely, older children tended to use both hands when the avatar was immovable, and only one hand with the moving avatar.

Table 1 Avatar positioning and number of hands vs. average task duration

In light of these findings, the choice of avatar state (movable or not) should be carefully designed when accommodating younger children—general user experience may be negatively affected by the lower cognitive abilities. Younger children may decide to stop the activity, instead of trying to explore new ways of interacting. This is also confirmed by our observations: 18% of children aged 2–4 years indeed did not complete the task. This percentage drops dramatically to less than 1% for older children. The dropout happened in two ways: the researcher identified signs of stress on the child and asked if s/he wanted to withdraw, or the child spontaneously decided to drop out. One of the two researchers has a long experience in child–computer interaction and part of his role was also to make sure that children felt comfortable during the interaction.

4.1.2 Affordance blindness

During our tests, we wanted to understand whether the avatar interaction modalities were self-evident. However, we noted that 50% of the users started the interaction session by trying to touch the display (see Fig. 6), despite the presence of the avatar on the screen. Looking more in depth at the data, we did not notice a statistically significant effect of the avatar condition (movable vs. immovable) with respect to children attempting to interact by touch. However, However, Chi-square test showed that age has a significant effect (χ2(2) = 9.8090, p < 0.01) on attempting to interact by touch: 76% of children aged 2 to 4 years old did not try to use touch at the beginning of the interaction session, whereas this percentage drops to 35% for younger children (5-7 years old) and then it goes up again to 60% for children aged 8-10 years. Even if we do not have direct feedback from children and parents about this, it is highly likely that this behavior could be related to some previous exposure of children to touch technology, in particular to the use of smartphones and tablets [27]. This is particularly true for the age range 5–7 years, whereas younger children are less biased. As for older children, they could have guessed the touchless gestural nature of the interface by noticing the presence of the Kinect on top of the display during the experiments, which is a well-known device in that age range for gaming. This is supported by observing that two-thirds of children aged 8–10 years stated that they had some previous experience with gestural games. A further analysis showed that 75% of all children who tried to interact by touch also stated that they had never had experience with touchless gestural games.

Fig. 6
figure 6

A child trying to interact by touching the display

Table 2 Age ranges vs. number of hands
Table 3 Age ranges and avatar positioning vs. number of hands

Another interesting finding of our study is about the time needed by children to understand the interaction modality. We estimated this time starting from the total time needed to complete the training session. In more detail, the total time needed to complete one training session included both the time to figure out how to interact, and the time to complete the dragging of a single piece. The latter can be in turn estimated as one-sixth of the time needed to complete the task session since it consisted of six single piece dragging tasks. Consequently, we estimated the average time required for understanding how to interact with the interface as the difference between the average training time (82.80s) and one-sixth of the average task duration (80.41s). The resulting estimated time needed to understand how to interact was 69.40s.

4.1.3 Age and performance

Regarding the participants’ age in relation to the number of suggestions needed during the training and the time required for completing the tasks, we noted a fairly homogeneous trend (see Fig. 7). Older children generally required a smaller number of suggestions and a lower time to accomplish the tasks. This is quite easy to explain considering a child’s physical and cognitive development.

Fig. 7
figure 7

Trends of number of instructions, training time, and task time compared to age

In particular, a Kruskal–Wallis test showed a significant effect of age on task duration (χ2(8) = 32.841, p < 0.01), as well as on training duration (χ2(8) = 23.282, p < 0.01) and number of suggestions given during the training phase (χ2(8) = 25.841, p < 0.01). This is probably a direct consequence of different cognitive abilities, which have a primary impact on performance.

4.2 Fostering enjoyment and effect on recall

4.2.1 Children’s enjoyment when making the jigsaw

In order to give a more complete overview of the trends emerging from the data analysis, we merged qualitative and quantitative data.

First of all, the age group has a significant effect on enjoyment. A Kruskal–Wallis test confirmed the statistical significance of this effect (χ2(2) = 5.467, p < 0.05).

Figure 8 shows the outcome according to separate ages, since the trend was more evident than when grouping this on the three ranges.

Fig. 8
figure 8

Enjoyment levels according to age (no data available for children of age 2: none of them wanted to answer the questionnaire)

It seems that very young children and the oldest in the study did not much enjoy the jigsaw. This could be due to several factors, which may be concerned with children’s physical and cognitive skills as well as their individual preferences. We expected this effect on very young children (aged 2-4 years) since it is well known in literature that at this age they have limited physical and cognitive abilities [23]. For what concerns older children (7–10 years), the main reason for a drop of enjoyment remains unclear. An explanation was given by two children (ID17136, ID765), who mentioned that they would like to play with a jigsaw with more pieces as those at home usually have. However, this effect needs to be studied more in depth.

In order to understand enjoyment more in depth, we looked at questions Q7 and Q8 by performing a thematic analysis. The codes were categorized into nine sub-themes and four main themes:

  • general appraisal of the activity and the paintings

  • enjoyment of being physically engaged with the interface

  • the challenge of understanding the interaction modality

  • the avatar

General appraisal of the activity and the paintings

Children really liked the paintings in the two jigsaws. They described the experience using positive adjectives such as fun, cool, fantastic, clever. Specifically, one child mentioned that it makes him “feel positive feelings” (ID1763). They also appreciated solving the jigsaw and “mixing the pieces” (ID1746). These general statements provided strong evidence that making the jigsaw was pleasurable and the paintings we had selected were appreciated.

Enjoyment of being physically engaged with the interface

A child (ID17117) told us that he likes how the system “gets people moving instead of staying on the floor”, and another one (ID2181) said that she likes to “control using your hands”, or “I like to use my hands instead of the mouse” (ID602). Often, they cited “using my body” (ID765) and “moving my hands” (ID1505). In other cases, they were very specific (e.g., “grabbing the pieces and drag in the empty spaces”, ID17147). In addition, children mentioned peculiar hands movements such as “squeeze” and “pick up”. The data showed that children really enjoyed the type of interaction based on body movement, and the hand gestures of pointing, squeeze and moving the piece into the right place.

The challenge of understanding the interaction modality

Children showed that they appreciated that they need to put some effort in understanding how to interact with the interface, e.g., “You have to find your way to play with it” (ID17136), “I like understanding how to move in the space” (ID1128), “It was sort of creative, you can create your movement” (ID1770). They described this first moment as “tricky” and “challenging” but in a positive way: “It starts tricky and it goes easy” (ID17115). In addition, they also appreciated the way the interaction modality makes them “think” (ID1722) and “stayed concentrated” (ID17103). Thus, despite that the interaction is difficult to understand initially, the discovery of the interaction modality made the children more engaged in playing. It is worth noting, however, that the difficulty in understanding how to interact with the display was also mentioned as a negative aspect by few children (IDs: 1711, 1605, 1604, 287).

The avatar

The way in which children mentioned the avatar gives us an indication of how they interpreted it. Children referred to the avatar in two main ways: as an external agent, or as “myself ”. In the first case, children saw the avatar as a different person or entry: “the person on the screen is copying me” (ID17129), or “you have to stand back and the person that was on the screen wants me to grab the pieces” (ID 1763). These children are aged 2–5 years and at this age they have not yet developed the ability to understand other people’s mental and emotional states [30].

Older children (6–10 years) interpreted the avatar as the “the copy of me” (ID256) or mentioned that “you can move and pretend to be a robot” (ID1720), “it shows your body” (ID1766). However, both younger and older children expressed appreciation of being able to play using the avatar.

4.2.2 Enjoyment and recalling content

As mentioned in Section 2.2, previous studies have shown that enjoyment has a significant impact on recall [2, 16, 42]. Based on the data we collected during our observations, we noticed a similar relation between enjoyment and the recalling indexes resulting from the answers to questions Q7 and Q8 (see Section 3.6). In particular, we grouped the enjoyment level according to two categorical values (enjoy vs. not-enjoy), by counting as a positive value all the cases where the level of enjoyment was ranked as 4 or above. This binary choice is supported by the use of a smile-o-meter, where smiling faces corresponded to 4 and 5 points in the Likert scale. Using this categorization, two Wilcoxon–Mann–Whitney tests showed a significant effect of enjoyment in both Q7 (z = − 3.896, p < 0.01) and Q8 (z = − 2.930, p < 0.01). This means that children who enjoyed making the jigsaw were then able to better recall information, which is in line with prior work [2, 16, 42].

We did not notice any other significant effect (e.g., avatar representation, use of both hands) on enjoyment.

5 Discussion and lessons learned

In this section, we summarize the lessons learned from our study with the aim of informing the future design of touchless gestural interfaces for children based on avatars. Our study provided findings that helped us to better understand how to design such systems for children.

  1. (1)

    Choosing a movable or immovable avatar depends on the interface goal and age of users

In our experiments, we found that the use of an immovable avatar makes two-handed interactions more effective, in terms of reducing time required to perform the task. Conversely, a movable avatar suits better users who interact with a single hand. Looking back at prior work, authors have shown that users typically tend to stick to their preferred hand [40]. Nonetheless, it is worth noting that “eliciting” the use of two-hands by designing an interface with a movable avatar may result in more visible interactions, simply because using different hands involves a bigger part of the user’s body in movement. Therefore, if the goal of a designer is to place such an interface in public, methods to elicit more two-handed gestures should not be discarded a priori. On the other hand, one of the most usual critiques about “forcing” two-handed interaction is related to the constraint of passers-by, who may carry bags and thus do not have both hands available for interacting. This issue is much less significant in the case of children, therefore a movable avatar might make sense in public contexts. As a first result of our observations and of the short discussion above, we can conclude that if the goal of the interface is to engage children in some “demanding” activity (such as a game), employing immovable avatar and two-handed interaction appears to be the optimal solution. If the goal is to let children quickly access some information, then the most suitable option is the movable avatar with an interface layout affording single-handed interactions.

Considering the chronological age, we noted that children aged 2–4 and 8–10 years preferred to use both hands. This means that the use of an immovable avatar is the best choice for these age ranges. Moreover, opting for the immovable avatar is even more relevant for younger children, considering their large drop out (18%, see results in Section 4.1.1), as well as the difficulties of younger children in adopting the perspective of others in perceptually based tasks [32]. This finding is of particular relevance for designers, since it has never before been documented in the literature.

  1. (2)

    Avatar as a means to overcome affordance blindness

The avatar was more effective for communicating touchless gesture-based interactivity to younger children (2–4 years) when compared with older pupils, who tried to touch the screen more often. However, compared with prior work, the avatar was less effective for understanding the interaction modalities of children than with adults (see Section 4.1.2). Therefore, we recommend using avatars with younger children in order to design effective interactions. When designing for older children, using only the avatar might not work as an effective way to overcome affordance blindness. Indeed, for children aged 5 to 10 years, it is probably better to also include additional explicit calls-to-action or other techniques [11, 38].

  1. (3)

    The avatar as the main driver of engagement, enjoyment, and recalling

Across all chronological ages, children enjoyed playing with the avatar and completing the jigsaw. Children perceived the avatar as an external agent (“the robot”, ID1720) and as “herself ” mirrored on the display. In both cases, children enjoyed playing with the avatar. In the literature, the relationship between enjoyment and engagement is well known [42]. Studies have shown that engagement can predict enjoyment [24, 33].

Moreover, enjoyment is essential in learning and education [2, 35]. Our data demonstrated that the more that children enjoyed the experience, the better they recalled the content provided during the interaction. Having the avatar as the main interaction driver resulted in it being a winning choice for engaging pupils in the interaction, therefore facilitating information recall.

  1. (4)

    Balancing the physical challenge and engagement

Children enjoyed moving their bodies and finding their own ways of interacting with the jigsaw through the avatar. A majority of the children (95%) enjoyed interacting with an interface that makes them “think” (ID1722) and “figure out” (ID1128) how to move the pieces and perform the task. They also mentioned that they liked that it was “challenging to discover how to move the pieces” (e.g., ID3021). In addition, we estimated that children spent on average 69.40 s before accomplishing the task requested in order to complete the training session. This value included the time needed to understand that it was a touchless gestural interaction and the time needed to understand how to operate the interface elements and to trigger possible actions. Furthermore, during our observations, we noticed that children spent most of this time playing with the avatar, after having guessed the interaction modality and before placing the jigsaw piece at the right place. This allowed us to guess the effectiveness of the avatar in improving the children’s experience. To confirm this, we noted that in [26] authors reported an average interaction duration with a touchless gestural game on a display of about 31s. This means that the time our children spent simply playing with the interface before accomplishing the actual training task was almost double this.

Thus, designing engaging interaction patterns within this context means creating a good balance between making children understand how to physically operate with the interface components (e.g., jigsaw pieces), and at the same time to challenge them to find their own way. Of course, designers also have to take into account children’s age and abilities [44].

6 Conclusions and future challenges

This study shows that children enjoy interacting with an avatar. We also found that the avatar’s engaging role drives children to improve their ability to recall contents. Moreover, we found evidence that age influences the style of child–avatar interaction. Younger children (2–4 years) tend to better guess how to correctly interact with the avatar (i.e., via mid-air gestures), if compared with the older children’s behavior, who tried more often to touch the display. Recent investigations have shown that younger children have a less pleasant experience interacting with smartphones or tablets when compared with older children [27]. On the other hand, older children do not assume the availability of touchless interaction technologies and often opt for traditional modalities. A future direction could be to conduct a study in which we investigate the effect of prior experiences with touch-based interaction on affordance blindness in touchless gestural interfaces.

Finally, our results suggest that avatars could facilitate the recalling of contents related to paintings on the jigsaw. This could be an initial point to be further explored in order to understand how to use avatars when developing new effective educational technologies for young children.

In addition, we plan to redesign the Interactive Art Jigsaw incorporating our findings, and to replicate the study in a real context (i.e., a museum). This would allow us to better understand the effect of the interaction on children, and how they recall information about artworks.