1 Introduction

The study of children emotions when learning a new task is highly important as they influence their engagements and performances during the learning process. Recent studies (Anagnostopoulou et al., 2021; Baltrusaitis et al., 2016; Karumbaiah et al., 2022) attempt to improve the quality of learning techniques by continuously detecting the development and change in students’ emotional experiences in order to better support individual students and maximize their engagement and learning gains (Anagnostopoulou et al., 2021). However, the task of learning to write vocabulary in any language is a challenging task especially for young children and those with low handwriting skills and moderate memory, which leads to repeated failure and may result in negative emotions. Some studies (Baltrusaitis et al., 2016; Bowers & Hayle, 2020) used haptic devices to help children learn the language independently and efficiently. While others, for example (Grundmann et al., 2021) monitored the emotional status of each student and provided information about student’s mood to the teacher in real time when e-learning. Some studies focused mainly on features such as pen pressures as indicators of enjoyment and frustration (Anagnostopoulou et al., 2021); gaze direction (Khan, 2020), facial expression (Young-Seok Kim et al., 2013), pose estimation (Guneysu Ozgur et al., 2020) and many other techniques (Ashwin & Guddeti, 2020; Dewan et al., 2019). However, few research works have been carried out on how children with low handwriting skills react to different virtual reality environment (VR) writing tasks (Baltrusaitis et al., 2016; Savov et al., 2018; Zhao et al., 2018).

To the best of our knowledge, this is the first study exploring child emotion and attention while handwriting Arabic in VR. However, learning to write vocabulary in Arabic is a challenging task especially for young children and those with low handwriting skills and moderate memory. Some Latin native children learning Arabic as a second language mimic Latin letters when they write in Arabic. They start writing from left to right while in Arabic words should be written from right to left. Moreover, Arabic letters have substantially different shapes depending on whether it will be connecting with a preceding and or a succeeding letter, thus all independent letters have conditional forms. These forms occasionally give confusion to new learners especially for young children. Essentially, the technology of haptic devices can help children learn the language independently and efficiently. In fact, they can start practicing handwriting letters with higher motivation. In addition, they can learn at home and repeat the task several times. Psychologists hypothesized that children with writing difficulties will be cognitively empowered while interacting in a VR during the learning process. VR offers positive reinforcement and a better sense of engagement to keep children’s interests awake.

In view of the belief that children’ emotions and attentiveness may have impact on the handwriting performance, in this research, we aim to explore the link between participants’ emotions and their attentiveness and performance, if any, when learning to write Arabic letters. Moreover, there is reason to believe that emotions and level of attention may have a positive impact on performance and might further mediate the link between emotions and learning performance. This assumption was demonstrated in several research within the field of learning where it reveals that learning activities are always accompanied by emotions (Pekrun, 2006; Pekrun et al., 2007; Schrader & Kalyuga, 2020; Schrader & Nett, 2018; Schutz & Pekrun, 2007).

Even though these associations are not guided by any explicit predictions due to the inconsistent findings of existing research (Karumbaiah et al., 2022; Schrader & Kalyuga, 2020), our study contributes to an improved understanding of the impact of students’ emotion and attentiveness on their handwriting performance. However, the relationship between emotion, attentiveness and learning outcome appears to be more complex and not as straightforward as is generally assumed (Karumbaiah et al., 2022).

First, student emotional experience and its contribution to the learning process is not well-investigated (Schrader & Kalyuga, 2020). Only few (Schrader & Nett, 2018; Schutz & Pekrun, 2007) research have shown the importance of emotions in classroom, as they can both positively and negatively influence students’ engagement (Schrader & Kalyuga, 2020) and their well-being (Hill et al., 2019) during learning, as a result, their performance. However, a recent review (Karumbaiah et al., 2022) found out that the affective states of students (confusion, frustration, and boredom) in VR learning have mixed relationships with their outcomes. Some of these researchers in the latter review pointed out that there is no significant difference in students’ academic achievement and retention in various affective states for instance in social study learning (Karumbaiah et al., 2022). While some studies suggest that positive emotions promote self-regulation and motivation may not improve learning, some negative emotions may promote learning by triggering motivation to learn better (Worsley & Blikstein, 2015).

Second, student attentiveness has been proven to improve the learning performance (Fredricks & McColskey, 2012), (Au et al., 2016). However, a lack of student attentiveness and active engagement with learning activities has become a significant challenge that negatively influences learning outcome (Creelman, 2021; Derakhshandeh et al., 2021). Many studies have been performed to determine the attentiveness of students in a learning setting (Adam et al., 2009; Kainat et al., 2022; Lipp & Neumann, 2004), however all these studies focus more on determining the attention states rather than studying the effective relationship of attentiveness on learning performance. Only few (Creelman, 2021; Fredricks & McColskey, 2012) studies such relationship. For instance, (Creelman, 2021) found out that inattentive student accounts for failing to finish work, failing to show close attention to details.

To the best of our knowledge, there is no research on the concrete impact of participants’ changes in emotional and attentional trends, at least not concerning Arabic handwriting performance. Our study contributes to an improved understanding of the impact of students’ emotion and attentiveness on their handwriting performance in a VR environment. Our findings add value to a range of existing research works and confirm that emotional changes have no direct relation with writing performance, but only, affected by overall emotion. Besides, performance was not affected by attentional change, but only, by eye blinks at the beginning of the experiment only. The results are important because they have many implications and may help researchers to improve the conceptualization and the measurement of student emotion and attention changes, effectively. More precisely, our study focuses on the impact of a haptic device-based learning technology on students’ emotions and attentiveness on their writing performance of Arabic letters. Furthermore, the continuous detection of emotion states through emotion detection models (Anagnostopoulou et al., 2021; K V & Bahel, 2021; Mukhopadhyay et al., 2020) and through eye blink detection model (Soukupová & Cech, 2016) as objective metrics, try to cope with some of the methodological problems such as self-reporting (Schrader & Kalyuga, 2020).

We use a haptic device with a stylus-like tool that allows the children to physically touch the pen, feel the force and movement and write in a virtual space within a harsh environment to assess their motivation and focus while also fine-tuning their motor skills. Since, handwriting is generally difficult for young children even when writing in a 2D space, writing in a 3D or VR environment is more challenging for children due to many reasons including arm fatigue and visuomotor coordination issues like problems in line following, poor fine motor movement, etc. All that may cause out of focus, boredom and frustration. However, these factors when resolved can contribute to mastering the proper writing and strengthening the understanding of the language vocabulary. To this point, we aim to explore real-time data (e.g., temporal, and spatial), which provides objective measurements to assess the effect of a new haptic device on the learning skills such as enhancing memorization of vocabulary, endurance, focus, and engagement. We investigate whether using a haptic device to practice handwriting can be an effective technique to motivate the children for handwriting. Further, we examine if it resolves the difficulties in the writing process. Based on guided writing mode that follows the letter trajectories, students are able to evaluate and correct their writing errors and, as a result, improve their writing skills.

In summary there are two main research questions for our study:

  1. 1.

    First, to what extent participant’s emotions (either at begin or at any stage) contribute to better handwriting performance.

  2. 2.

    Second, is there any effect of attention level (either at begin or at any stage) on handwriting performance.

The remainder of this paper is organized as follows. Section 2 presents our proposed approach and experiments settings to study the effectiveness of using a haptic device in analyzing schoolchildren emotions, attentiveness and handwriting performance of Arabic letters. Section 3 discusses the results and evaluations. Section 4 presents the key observations and implications, while Sect. 5 concludes the paper, highlights the limitations and futures future works.

2 The approach

2.1 Participants

We selected fifty-two children from primary schools in Doha, Qatar. These children were able to hold a pen and manipulate it appropriately. They are from different ethnicity and of an age between 5 and 11 years (n = 52, M = 7.2, SD = 1.2) as shown in Table 1. We explained to them the objective of this study, discussed the details of the experiments, answered their questions, and gave them a training session to write anything they like with the haptic stylus in the VR environment and spend time using it to the point where the children felt relaxed, mastering the pen manipulation before starting the experiments.

Table 1 Dataset details

2.2 Equipment

We use the Touch™ haptic device from 3D-Systems company (https://www.3dsystems.com/haptics-devices/touch). The haptic device is connected to a laptop with control algorithms allowing guided writing (i.e., with partial and full guidance) and free writing. We recorded all the experiments for further analysis. Initially, we obtained the Institutional Review Board (IRB) approval to test the haptic device with children in schools in Qatar. After obtaining informed consent from the parent, we gave instructions to the children about the task of handwriting the three letters in 3D using both modes. We observed and assisted when required. To evaluate the system, we asked the participant to write the letters before training and after training in free hand using a usual pen to analyze the change.

2.3 Scenarios

We design two scenarios for the experiments with tracing three letters in 3D using the haptic device as follows:

  1. 1.

    Write three independent letters in guided mode.

    • Show an image of an animal object.

    • Show an animal name with one missing letter.

    • Child writes the missing letter using full/partial mode.

  2. 2.

    Write three independent letters in free mode.

    • Show an image of an animal object.

    • Show an animal name with one missing letter.

    • Child writes the missing letter using record mode.

2.4 Haptic task procedure

The haptic task procedure is given in Algorithm 1. Each participant completed three sessions: a training (˜8 Min), a pre-test session (˜8 Min), and a post-test (˜8 Min). Each participant spent 40 min on average for this study. The pre- and post-tests consisted of three Letter tasks to observe whether there were any differences in performance.

Algorithm 1 The haptic task procedure.

figure a

In the task session, we ask the participant to repeat the letter tracing when we observe that the letter shape diverts largely from the true trajectory, which means task score, is very low. For participants with severe writing difficulty, we repeat the sessions, and a break of five minutes is added between two consecutive sessions. If a participant showed no knowledge than he/she repeats the tasks more times and we provide path cues to augment visual guidance for the participant. Thus, the participant got more opportunities and hints to practice and improve relevant fine motor skills in 3D. Note that we played short stories with animal characters to augment visual modality for all participants. At the end of the experiment, we ask the participant verbally a short survey and explained each question to the participants and encouraged them to express their feelings as such feedback will help make the system better.

2.5 Datasets

We collected datasets as depicted in Table 2. It comprises 520 handwritten Arabic characters (in both modes) and 52 video recordings for 3 different views, namely front, side and webcam view. For the task of character recognition, we retained only 156 handwritten Arabic characters for the analysis since many handwriting attempts were excluded due to being either too noisy or incomplete.

Table 2 Details of collected handwriting dataset

Thereafter, we cleaned the noise around the characters. Most of the noise occur around the start and the end of the character trace. Some noise resulted because of hand movement when the child was trying to grip the pen and finding the start position in 3D. Figure 1 shows cleaned character samples from our dataset.

Fig. 1
figure 1

Samples of cleaned Arabic characters. (a) Alif, (b) Laam, (c) Faa

2.6 Metrics

To assess emotion, focus and handwriting performance of the participants, we use both objective and subjective metrics described in Table 3. For instance, we identify facial landmarks: eyebrows, eyes, nose and mouth as the critical features for objective emotions prediction. Whereas subjectively, we use the observations from the teachers on the participants’ behavior (Schrader & Kalyuga, 2020). We can explore the results with the children questionnaire too and derive to some extent their motivation and engagement. Note the questionnaire reliability was 0.56 using Cronbach alpha (Wessa, 2021) at the time of development. The questions, given in Table 5, were verbally asked, and the answers were noted.

Table 3 Objective and subjective metrics

3 Results and analysis

In following section, we report our results and analyze them using Minitab software. The means and standard deviations of the measured variables are shown in Table 4 with p-value = 0.002 for Group A and p-value = 0.000 for Group B, so we rejected the null hypothesis that all means are equals. Note that Eye blink:B, Eye blink:M and Eye blink:E are the values of eye blinks extracted from the video sequences at begin, midway and the end of the recorded handwriting task.

Table 4 Mean and standard deviations for eye blinks, attentiveness and writing performance

3.1 A. Participants’ feedback

There were four questions included in the questionnaire asking the participants’ feedback on their experience with handwriting task using the haptic device. As shown in Table 5, the survey results are based on two age groups A (5–7) and B (8–11). The four questions were scored on a five-point scale (where Very much (5), A little, Neutral, Not much, Not at all (1). In the following discussion, Avg1 and Avg2 stand for the average score obtained by the two groups, respectively. First, participants in both groups liked the handwriting task (Avg1 = 4.73, Avg2 = 4.80) and they enjoyed the activity (Avg1 = 3.63, Avg2 = 4.00). It indicated the positive feedback of the participants for the tasks. Second, group B could repeat the activity. However, we found that due to the gripper size, it took longer for the younger participants to perform grip adjustments; some did not manipulate the Haptic Gripper very well in the shorter time (Note that they performed guided mode only) especially when they tried to apply large finger pressure. In addition, the participants felt tired and had some pain (Avg1 = 2.70, Avg2 = 3.30). However, we still can see that the free mode was more difficult for the participants. We found that the participants’ answers were to some extent consistent with their handwriting performances.

Table 5 Participants’ feedback from the questionnaire

3.2 B. Participants’ emotions and focus

Emotions and eye-blinks are an indication of intrinsic cognition through extrinsic responses (Hömke et al., 2018), (Neumann et al., 2004). Emotion and eye-blinks can be a determination of an individual’s focus and attention (Lipp & Neumann, 2004), (Adam et al., 2009). This enables the use of them as performance metrics for attention and focus in this experiment. The methodology used for this purpose was automated through deep learning using a CNN model for emotion detection (Goodfellow et al., 2015) and an eye blink detection model for eye blink detection (Soukupová & Cech, 2016). First, each frame of the video captured during the experiment is analyzed. Then, the eye-blink is captured in frames correlating with that of eyes closed in each frame. Eyes are detected with an EAR (eye aspect ratio measurement). Each frame of eye blink was captured and their count was accumulated and noted. The attentiveness through this model was measured by the following equation (Patil et al., 2021) and the obtained results by age group are illustrated in Figs. 2 and 3:

Fig. 2
figure 2

Total blink rate by age group

Fig. 3
figure 3

Attentiveness by age group

$$Attentivness= \frac{TotalFrames-BlinkFrames}{TotalFrames}$$
(1)

Further, in Fig. 4 we showed the changes in eye blinks over time, where begin corresponds to (Eye blinks:B), midway (Eye blinks:M) and end (Eye blinks:E) of the handwriting task. Table 6 reports the p-values for the significance of eye blinks while handwriting task. It is clear that the effects of the haptic handwriting was significant to align with p-value of 0.0000 and 0.001 in middle and end of the experiment, respectively.

Fig. 4
figure 4

Blink rate by age group in begin (a), mid (b) and end (c) of handwriting task

Table 6 P-value for blinks over time during the task

For the emotion detection process seven emotional states, namingly, happy, sad, fearful, angry, surprised, neutral and disgusted were identified using state of art Facial Expression Recognition (FER) dataset trained model (Goodfellow et al., 2015) which used haar features to identify the face and detect emotion. The captured videos of each subject were analyzed using this model and emotions of each frame were noted. Eventually, the mode of all emotions were taken for an overall emotional state description. The experiment was divided into the three zones; beginning, middle and end of the experiment with each zone’s emotional state was captured at the end. This discriminated the emotion at each state of the experiment, and thus leading to a conclusion of the overall state of the participant throughout the experiment. As the deep learning models’ accuracy is 63.2%, the results were further verified for random sets to validate the data and produce an observed analysis of the emotion. Figure 5 depicts the measured emotion from the video analysis performed during each scenario of the experiment. In cases where the emotions are unreliable, human observation is performed to decide if the models result correlates with the emotion predicted from the deep learning model used as in (Dahmane & Meunier, 2011).

Fig. 5
figure 5

Emotions by age group in begin (a), mid (b) and end (c) of handwriting task

Further, we showed the changes in participants’ emotions over time (begin, midway, end, overall) during the handwriting task. Table 7 reports the p-values for the significance of emotional responses while doing handwriting. It is clear that the effects of the haptic handwriting was significant to modulate sadness with p-value of 0.034 and 0.014, midway and at the end, respectively.

Table 7 P-value for emotions over time during the task

3.3 C. Participants’ performance

In this section, we first report the effect of changes of emotions and attentiveness during the process on the handwriting performance. Second, we focus on the effectiveness of our handwriting activity in fine motoring the child to write Arabic letters. We used Analysis of variance (ANOVA) to compare the differences across the means of the two groups. The results report a significant difference between participants’ eye blinks at begin and their performance in the practice part of Arabic letters (F = 74.36, p = 0.001), indicating a positive learning gain in handwriting. While no significant effects of emotion changes on the performance were reported either midway or at the end of the task, a significant effect of overall emotion was reported with a p-value of 0.0014. Furthermore, as reflected by the mean of overall emotions, the performance was found to be affected by angry emotions with a mean of 92.6, as shown in Fig. 6. No significant impact of observed emotion on resulting performance was reported (p-value = 0.366). The performance was measured by the following equation and the obtained results by age group are illustrated in Figs. 8 and 12:

Fig. 6
figure 6

Performance versus overall emotion

$$Performance= \frac{DTW\ score}{Completion\ time}$$
(2)

Note that we were firstly interested in letter tracing accuracy only. When considering the completion time as a cost function (Williams et al., 2016), the performance as such allowed us to account for the speed-accuracy trade-offs that participants would make when performing the handwriting task. Thus, we could determine if better tracing accuracy was a result of increased handwriting skill or simply a slower movement. A higher performance indicated efficient and overall good task performance.

Regarding the pre/post-tests, it was scored on a five-point scale (Very much (5), A little, Neutral, Not much, Not at all (1)) to measure the progress in letters learnt during the sessions. To analyze if there was overall learning in writing letters for all children in all sessions, we did a paired t-test (since our data is normal distributed), which indicated that post-test scores were significantly higher than pre-test scores (p-value = 0.015). However, ranking between 1 and 5 was not reflecting the improvement in writing performance of children who were initially unable to write at all. There were instances where the pre-test performance was not gradable (no letter was written) and the post-test performance was very low but comparably closer to actual writing. We clearly remark that some learning took place, but both performances received little rank. In order to reliably quantify the writing performance, by focusing on letter shape quality, we switched to DTW and demonstrated the results in Figs. 7 and 8. Moreover, the average DTW and performance scores in the training session for each participants are shown in Figs. 9 and 10 respectively.

Fig. 7
figure 7

Average DTW results by age group

Fig. 8
figure 8

Average performance by age group

Fig. 9
figure 9

Average DTW results for each participant

Fig. 10
figure 10

Average performance results for each participant

Regarding the effective completion time, we can see that the average completion time for handwriting the letter laam was shorter for both groups, as shown in Fig. 11. The effect of age group is statistically not significant for both groups (p-value = 0.114). However, completion time and post-test were significant with p-values of 0.001and 0.008, respectively. It indicates that both groups improved their hand control of the haptic device to trace the letter correctly and improved their post-test as shown in Fig. 12. Figure 13 gives the correlation between all metrics, while Table 8 gives strong correlations only.

Fig. 11
figure 11

Average completion time by age group and Arabic letter

Fig. 12
figure 12

Average Performance by age group and Arabic letter

Fig. 13
figure 13

Correlation matrix between all metrics

Table 8 Strong correlations between performance metrics

3.4 D. Participants’ character recognition

For most computer vision applications, deep learning architectures have been the de facto choice (Ali et al., 2020; Balaha et al., 2020). Their effectiveness, however, is strongly reliant on access to labelled large-scale datasets. In this experiment, we are using a relatively small dataset with three classes only, thus machine-learning algorithms will be sufficiently enough. The dataset has 147 images and classified into three classes Alif (أ), Faa (ف) and Laam (ل), as shown in Table 10. The Alif has 51 samples while the Faa and Laam have 48 samples. In the experiment we used tenfold cross-validation to estimate our proposed machine models skill.

To prepare the data for training, we used the PHOG Filter (Pyramid Histogram of Oriented Gradients) proposed by Bosch et al. (Bosch et al., 2007) for feature extraction. The PHOG stores information about how intensity gradients are oriented across an image. At each resolution level, it consists of a histogram of orientation gradients over each image sub-region. The distance between two PHOG image descriptors indicates how similar the images are in terms of shape and spatial layout.

Three classifiers were tested to classify images, namely SMO, Random Forest and Multilayer Perceptron. The Sequential Minimal Optimization (SMO) proposed (Platt, 1998) to tackle the quadratic programming (QP) problem that emerges during the training of Support Vector Machines. The Random Forest is made up of a huge number of individual decision trees that work together as an ensemble. Each tree in the random forest produces a class prediction, and the class with the most votes becomes the prediction of our model. The Multilayer Perceptron is a type of neural network. The strength of neural networks derives from their capability to learn how to effectively match the representation in the training data to the output variable to be predicted. Neural networks can learn any mapping function mathematically and have been demonstrated to be a universal approximation algorithm. The results on Table 9 show the performance of the three classifiers. We can see that the PHOG features help in gaining high accuracy with the three classifiers and the Multilayer Perceptron gained the best performance.

Table 9 Results of using PHOG feature extraction for SMO, Random Forest and Multilayer Perceptron classification

4 Discussion

4.1 A. Observations

Currently, studies on the effectiveness of learning technologies that also integrate haptic devices for educational purpose are not investigated for young students. Moreover, they involve some methodological issues such as small sample sizes (Schrader & Kalyuga, 2020). Therefore, in order to contribute to research on learning technologies with haptic devices, the purpose of this study was to determine the relationship between participants’ emotion and attentiveness, and their resulting performance as well as the effectiveness of writing Arabic letters using such difficult VR environment. Overall, the results confirm that using the haptic device had a significantly positive effect on students’ writing performance as measured at both pre-test and post-test. However, no significant effective change in emotion was detected. After a careful analysis, we can make the following key observations:

  • First, results from this feasibility study indicated that the haptic task was acceptable to study participants. The participants did not face large difficulty in using the haptic device and the system. However, not all participants completed the whole session due to time constraints. While the training time for the haptic device was quite limited, these results have to be interpreted with caution. According to (Pekrun, 2006), the emotional experience has an impact on learning outcomes in more extensive learning situations.

  • Second, our results, as can be seen from Table 7, show no direct relation of emotion change with resulting performance. However, these findings were not in line with other research (Hill et al., 2019; Pekrun, 2006; Pekrun et al., 2007; Shute et al., 2015) that have presented theoretical and empirical evidence to support the impact of emotions on students’ resulting performance. Our findings might be related to the emerging VR settings which seems to provide mixed results due to many reasons such as the experiment settings. For instance, the study (Karumbaiah et al., 2022) pointed out that there is no significant difference in students’ academic achievement and retention in various emotional states in different VR learning, particularly in social study learning.

  • Third, the analysis from Table 6 indicates that eye-blinks modulates participant’s focus. For instance, statistically significant reduction of blinks in the middle and towards the end of the handwriting task, as shown by p-values < 0.05. This finding partly confirms and builds on the common notions where after a while the student eye start to focus more and so the attention level. However, our results from Table 8 suggests that attentiveness has very weak effect on the resulting performance. These findings are not consistent with those of previous studies (Au et al., 2016; Fredricks & McColskey, 2012) which came in fact after a long history of studies investigating the impact of attentiveness in learning outcome. The latter studies show that attentiveness has a positive impact on performance and might further mediate the link between emotions and learning performance.

  • Fourth, the correlation analysis from Table 7 indicates that the emotion of sadness affects the performance scores in both groups with a p-value of 0.014 and 0.034 for the two groups. This cannot be correlated with common notion that sadness indicates focus. However, this finding seems to be in line with common belief that some negative emotions may promote learning by motivating to learn better (Worsley & Blikstein, 2015). For instance, the participants who displayed sadness maybe also effected by other external factors such as reluctance to do the experiment after their school hours or the deep learning model has misinterpreted the emotion due to model bias.

  • Fifth, regarding the children’s performance in handwriting, we observed significant improvements for two young kids, especially who have had difficulty in tracing correct shape and orientation, see Arabic letter samples in Table 10, while most of participants performed good as their letter tracings were accurately recognized as demonstrated in Table 9.

    Table 10 Samples of collected handwriting (guided and free) with DTW error scores, cleaned versions, and pre- and post-tests for each character

To analyze if there was grasping in writing letters for all children in all sessions, we did an analysis comparing both groups which indicated that DTW error scores were significantly lower for group 1 than group 2 as shown in Fig. 7. Though no significant effect of age group variable was detected for both age groups (p = 0.351, p = 0.356). Figure 9 explores the results of DTW scores for all participants. It is observed that the errors were either due to the participant writing a segment in the wrong direction, missing a segment or missing a directional change completely and continuing in the same direction. On the other hand, many participants start the letter tracing from left to right or down to up which causes large error score. Some missed the punctuation after finishing the letter tracing, while others lost the focus and had difficulties to control the grip of the pen.

We remark that maintaining a precise visuo-motor coordination was challenging for most of the participants. The stability of their hands’ movement had impact on the grip strength and grip precision where we observed that participants were moving their hands while writing which caused unnecessary noise. In particular, at begin and end of the trace, the participant who do not know the shapes were drawing without notice. However, we believe that the training session under the guided mode followed by the use of force feedback is responsible of the results (improvement in the engagement, handwriting legibility and speed) observed here and presents a possible enhancement compared to traditional methods. This assumption appears to be in line with the literature interested in the contribution of haptic devices on learning (Anagnostopoulou et al., 2021).

As shown in Fig. 13, attentiveness and performance were weakly positively correlated. Likewise, total blinks and performance were weakly positively correlated, whereas we did not find any significant correlations between total blinks and attentiveness. In addition, the correlation between the observed and the predicted emotion was also weak; where it clearly indicates no correlation with each other’s. This finding reveals that no association between the output of the emotion detection model and human observation. Although our observations suffered from the potential problem that no self-reported or additional measures were used to correlate the emotion, the strong correlation between the performance and the overall emotions does provide evidence that it might be reliable to detect emotional states using CV models rather than human observations. Nonetheless, our results report strong correlation between observed emotion and participants’ emotion midway. Table 8 shows strong correlations between performance metrics only.

4.2 B. Implications

Implications for teaching: As discussed in the first section, emotions and attentiveness are often treated as contributors to learning performance. However, as demonstrated in this work, emotion/attentiveness-outcome relationships are not as straightforward as is generally assumed. The relationship is even more complex when considering emotion changes and eye blink changes, long interaction time, physical and mental fatigue, etc. Except for overall emotion, performance was affected by overall emotion (angry/neutral) as well as by eye blinks at the beginning of the experiment only. Hence, more careful attention must be paid in learning in making assumptions about emotion’s role in student learning. Other factors, such as learning context (duration, mental fatigue, etc.), may also play an important role, and will be valuable to examine in such a VR task. Thus, an optimal learning and teaching style can be assigned to a particular student to perform optimally with. This is also helpful to teacher to adapt to individual students, so that they might learn successfully.

Implications for research: there is mixed findings (Karumbaiah et al., 2022; Schrader & Kalyuga, 2020) on the role of student affective states in learning. More research is needed to establish how emotion/attentiveness should be conceptualized to measure it effectively. First, current emotion detection methods may need to be improved to capture rare emotional states and resolve object occlusions more efficiently. Second, the null results for rare could be a limitation of the current approaches that fail to recognize the role of sparse but potentially important cases. Third, there is a possibility to measure a wider range of variables beyond emotion/attentiveness and handwriting outcomes such as engagement. Fourth, observing and asking students about their emotions during the VR tasks distract their learning, which might also influence their judgments as mentioned in (Schrader & Kalyuga, 2020). To avoid such distraction, we need to implement better indicators of discrete emotional and attentional states.

5 Conclusion

There is limited research on exploring handwriting using haptic device to improve the children focus and fine-tune motor skills. This study shows the feasibility of utilizing such task to explore the challenges and the solutions while producing handwriting that resemble to a harsh learning environment. As expected, significant changes in emotions and attentiveness as indicators of focus over the study were reported. Likewise, no direct relation of emotion change with writing performance was found. Performance was affected by overall emotion (angry/neutral) as well as by eye blinks at the beginning of the experiment only.

Although the findings of the present study were supported by statistical significance, some limitations should be mentioned. First, due to the short-term nature of the situation in schools (COVID restrictions), we experimented in a single batch without the possibility for extension, and eventually repeated experimentation on same groups of participants. Second, the current study relied on manual video editing, human observations and emotion detection models to measure perceived and emotional responses rather than psycho-physiological measures. Video editing and human observations measure the subjective experience of participants’ emotions and thus can be inconsistent, unreliable, and difficult to reproduce (Devillers et al., 2005). Another limitation is the use of CV models for eye blink and emotion detection (Goodfellow et al., 2015; Soukupová & Cech, 2016). Some participants covered their faces with masks, which had affected on their detecting emotions as key determining facial features were covered. To identify the emotional state in these as well as the eye blink count; they were deduced by human observation and were noted through feedback from the participant. An overall observed emotional state could only be determined from this approach. The approaches used for the emotional and eye blink detection can further be improved for diversity of faces and occlusions that may provide results that are more accurate. However, as we were separately observing the students’ emotions, we did not consider emotional engagement. Furthermore, motivation was not assessed either, as the means used for this study did not account for the measurement of engagement and motivation. Other studies (Hamari et al., 2016; Wang et al., 2016) relied on lengthy questionnaire during the experiments, which did not seem to be practical for our case. Other attributes such as gender, gaze direction and head pose are expected to further modulate emotional responses. Finally, the current study did not consider the interaction with the system in real-time, which would largely influence the emotional responses. A multi modal interaction could be an interesting direction for future research. However, more attractive tasks and a wide variety of fine-motor tasks must be developed to retain participants’ attention, especially in a long-term study. For Future works, we are planning to test this system on individuals with visuo-motor skills issues and/or handwriting learning disabilities to improve and retain fine motor skills.