1 Introduction

Social anxiety disorder (SAD), otherwise known as social phobia, is a type of anxiety disorder characterized by sentiments of fear and anxiety in social interactions, and by a strong desire to make a favorable impression on others, along with insecurity about being able to do so [1]. This can lead individuals suffering from SAD to be preoccupied with how others evaluate them, and to notice what went wrong in social interactions rather than what went right [1]. The tendency of such individuals to construct negative images of themselves and develop anticipatory anxiety about future interactions only further exacerbates the problem. Previous research reports that 10–25% of university students have impaired functioning due to SAD [1,2,3,4], with strong correlations with deficits in social skills, relationships, attention difficulties, learning problems, and with increased risk of exam failure and failure to graduate [1,2,3,4]. Seeking professional help can be difficult for students with SAD, since they may find it challenging to express the difficulties they are having due to their social discomfort [1, 2]. Cognitive behavioral therapy (CBT) is a psycho-social intervention commonly used in the treatment of SAD [5]. CBT focuses on providing skills to individuals in helping them to change their way of negative thinking and behavior. These skills are often accomplished through three techniques; cognitive restructuring, mindfulness training, and exposure therapy [6]. The aim of this study is within the following research question and objective:

  • Research question: Can a serious game be designed to engage university students in raising their awareness of cognitive behavioral therapy skills associated with a social anxiety disorder?

  • Objective: Participants have an increased awareness of CBT skills after playing the game, measured by a minimum 60% accuracy score in a post-game knowledge check.

With inspiration from Ciman et al. [7] we defined serious game as an application designed not only for fun, but to engage users into an activity, which produces a common good or teaches something valuable to the player. With inspiration from Hookham et al. [8] we defined engagement as: The intensity and emotional quality of a user’s involvement in initiating and carrying out activities.

Engaged users show sustained behavioral and cognitive involvement in activities accompanied by a positive emotional tone. The treatment of SAD is beyond the scope of this study, but the developed serious game will be evaluated based on the level of player engagement, and the level of awareness of CBT skills after playing the game. The elements of design and purpose within serious games for mental health are already well covered [7, 9,10,11]. However, the evaluation part is understudied [9], including how to define and measure engagement within applied serious games for mental health, and social anxiety disorder.

The paper is organized as follows: Section 2 presents related and previous works, Section 3 presents the developed game, including the design and implementation, while in Section 4 we present the evaluation methods, including study design and procedure. Section 5 provides the results and discussion. Finally, Section 6 presents conclusions and directions for future research studies.

2 Previous research

Previous research has reported CBT to be an effective form of intervention in the treatment of SAD [5, 12]. The central principle behind CBT is in understanding how thoughts, emotions, and behaviors are interrelated – that how a person thinks will affect the way they feel, which in turn will affect how they act. This can explain the thought process of a person with SAD – their negative thoughts about their perception leads to performing safety behaviors, which in turn leads to having negative emotions, resulting in a positive feedback loop (i.e. a vicious cycle) [5, 12].

There are already applications designed to provide help with SAD (which also target university students) in almost all of the six main types of applied serious games for mental health outlined by Fleming et al. [9]. These include, for example, the applications “Journey to the Wild Divine,” “Freeze-Framer 2.0,” “Pacifia,” “Self-Help for Anxiety Management (SAM),” “Challenger” [13], “SuperBetter” [14], and “SPARX” [15]. These applications all offer either exercises (e.g., for relaxing or breathing), quests, or community support and feedback. However, still missing are types of serious games with informative purposes and content associated with fear and anxiety in specific social interactions. Further, also missing is a focus on how to raise and evaluate awareness and engagement within a serious game.

O’Brien and Toms [16] described within a conceptual framework for engagement that engagement is as an ongoing process, with periods of sustained engagement, eventual disengagement, and the possibility of re-engagement. This ongoing process can be repeated multiple times within a single session – which would be an indication of an effective and engaging digital system [16]. O’Brien and Toms [16] argued that each stage of user engagement is characterized by a number of attributes, which they later on used to develop items for a multi-dimensional scale for measuring user engagement (UES) [17]. In a more recent study, O’Brien et al. [18] altered the structure of the UES to account for four instead of six constructs of engagement (combining endurability, novelty, and felt involvement within the new attribute ‘reward’). This was done to shorten the UES but still maintain its validity. The developed engagement evaluation is referred to as the user engagement scale short-form, UES-SF [18], and provides detailed instructions regarding the scoring of user engagement, also within serious games. Previous literature has also focused on narrative engagement, and developed suggestions for evaluations [19,20,21,22], typically based on scores within specific constructs.

In spite that O’Brien and Toms [16] define engagement as a quality of user experience and provide an evaluation framework [17, 18], it is still highly complex to integrate and evaluate engagement. The challenge is that engagement, also in the context of serious games, is a complex subject, as it encompasses various related concepts related to the user experience, including e.g. immersion, presence, flow, transportation, and absorption [16, 23, 24]. The interrelated nature of these various concepts results in engagement often being used without a clear definition, leading to possible confusion in measuring how engaged a user is within e.g. serious gaming. Most often the engagement in serious gaming is a means to provide some kind of learning [7, 22]. Learning is also a multidimensional construct consisting of behavioral-, affective-, and cognitive engagement [8]. Behavioral engagement is focused activity on a task, typically measured as time on a task [8]. Cognitive engagement is the mental activity associated with the content, and can be measured by either accomplishing the goals of a game, or through pre- and post-intervention tests [8]. Affective engagement is the emotional responses of players towards the game content, and can be measured in terms of more simple emotional cues (i.e. positive or negative affect), or more complex emotions (e.g. curiosity, interest, excitement) [8]. Affective engagement can be measured in terms of valence (positive or negative affect) and arousal (intensity of felt emotion) [25].

3 The game - design and implementation

The serious game was built using the Unity 3D engine and C# programming language for Windows, Mac, and Linux. The game’s visual elements were designed to look clean as well as stylish to achieve high aesthetic appeal amongst players. The game was a low-poly art style, giving it realistic and simultaneously stylish graphics. The different scenarios were ensured to be well-lit, and with a natural perceived lighting. The low-poly art style was implemented through the use of Unity asset packs. The character models and animations in the game were based on Mixamo [26]. The serious game had an underlying narrative whereby a non-player character (called Sam) explained the techniques associated with CBT to the main character called Thomas, a university student who is suffering from SAD. The final game was divided into four distinct scenarios, each mimicking common anxiety-provoking situations associated with university life. As Thomas gets himself into these anxiety-provoking situations, Sam introduces himself to Thomas and offers his help. Sam served as an empathetic character [27], with the narrative reasoning being that he had attended CBT before.

3.1 The scenarios

Each of the four scenarios was followed by an explanation of CBT or one of its associated techniques. Each scenario could be navigated freely by players, but could only be proceeded to the next stage, if the players initiated conversations with the characters. The scenarios escalated in difficulty both in terms of gameplay sequences, subject material, and the state of flow [23]. In the end of the game there was a brief recap to the players. The game consisted of the following scenarios with the included CBT techniques:

  • Scenario 1: Attending a house party → CBT introduction

The first scenario took place at a house party (Fig. 1). This setting was chosen not only because social gatherings can be an important aspect of a student’s life, but also because it can be used to demonstrate how the behavior of someone suffering from SAD can be affected by their negative thoughts. In this first scenario, the player was introduced to Sam, the non-player character.

Fig. 1
figure 1

Scenario 1 and introduction to Sam and the CBT skills at the house party

  • Scenario 2: Reflecting on the events of the party → Cognitive restructuring (CR)

Scenario 2 took place within a student group discussion in a student room at the university campus in which both Thomas and Sam were present. Sam asked Thomas to perform a cognitive restructuring (Fig. 2) by reflecting on his experience from the house party (from Scenario 1).

Fig. 2
figure 2

Scenario 2 and an example of cognitive restructuring

  • Scenario 3: Meeting with people in the cafeteria → Mindfulness training (MT)

The third scenario took place in the university’s cafeteria, sometime during the afternoon. Thomas’ social anxiety increases with constant hot (negative) thoughts (Fig. 3) as he would like to talk to Lea, a friend of Sam. Sam explained the concept of mindfulness training, then instructed Thomas to go and talk to Lea, whilst using mindfulness training.

Fig. 3
figure 3

Example of constant hot thoughts in scenario 3, visualized in Booleans

  • Scenario 4: Purposefully falling in the cafeteria → Exposure therapy (ET)

The fourth scenario also took place in the university’s cafeteria, this time during lunch break. The aim of Scenario 4 was introducting the players to the concept of exposure therapy. Thomas’ exposure therapy was included by his falling over with a tray of food in front of a crowd of people (Fig. 4).

Fig. 4
figure 4

Falling over with a tray of food in front of a crowd of people

3.1.1 Recap → CR, MT, and ET

At the end of the game there was included a recap. The primary aim of the recap was to sum-up the most important information presented in the game regarding the CBT skills. It is Sam, the non-playing character that summarized important learning points (Fig. 5).

Fig. 5
figure 5

Sam summarizes learning points from the scenarios

3.2 Autonomous symptoms

Where appropriate, there were simulated autonomous physical symptoms of SAD (e.g. increased heartbeat, shaking/tremble, high noises) within the four different scenarios. The different anxiety-provoking situations in the scenarios caused Thomas’s autonomous symptoms to worsen. The simulation of the autonomous symptoms was implemented using two functions; AutoSymptoms() and CameraShake(). These functions controlled the volumes of a heartbeat and eerie violin sound effects, and the shaking of the first-person camera. The AutoSymptoms() function was always in the background (inside the Update() function), and was controlled by an integer called ‘autoSymSeverity’. This switch statement determined the severity of the autonomous symptoms that the player should be experiencing, as seen in Fig. 6.

Fig. 6
figure 6

Part of the switch case inside the AutoSymptoms() function

As the value of ‘autoSymSeverity’ increases (to a maximum of 5), the severity of the autonomous symptoms worsens – the volume and pitch of the heartbeat sound effect increases, as does the volume of the eerie violin sound effect. Except for the case where ‘autoSymSeverity’ equals 0, the CameraShake() function, as seen in Fig. 7, is also called.

Fig. 7
figure 7

Camera shake funtion

The function takes two inputs; a boolean (called ‘shake’) that determines whether or not the camera should be shaking, and a float (called ‘magnitude’) which controls the severity of the shaking. The first person camera’s original position is saved in the Vector3 called ‘originalPos’ – this is the position the camera is returned to when the shaking stops. This shaking effect is achieved by displacing the camera on the X and Y axes by making them equal to a random float that is multiplied by the given magnitude. The camera’s Y position is slightly offset to prevent the camera from moving below its original position.

3.3 Conversations, Booleans, and whisperings

A conversation system was implemented to control which of the different dialogues need to be displayed, as well as to control how the player could progress through the story. Visually, the conversation system entails a number of UI elements including a backdrop image, a textbox, a continue button, and a title card (which displays the name of the character that the player is talking to). The different dialogues are displayed in textboxes, the contents of which are altered through code whenever the function ‘Conversations()’ is called. The function is assigned to the continue buttons’ OnClick() actions, meaning the next line of dialogue is displayed (or a narrative event will take place) whenever one of the continue buttons is clicked. This function operates on a system of booleans and if-statements (Fig. 8).

Fig. 8
figure 8

Part of the Conversations() function

When the function (Fig. 8) is called, a boolean determines what needs to happen next. If the condition is met, the same boolean is set to be false, some code is executed, and the function SimplyTurningBooleans() is called. Through a switch case, said function determines which boolean to set to true next after a short delay. The delay is implemented, otherwise the booleans would all be turned true and false right after one another. The dialogues appear in real-time, in a type-writer effect.

Additionally, Thomas’s hot thoughts (in the form of designed thought bubbles, as seen in Fig. 3) will start appearing, alongside whisperings, which is meant to represent how someone with SAD might believe others think negatively of them. Players progress through the narrative by advancing through the different dialogues, navigating the environments, interacting with characters, and completing the different CBT gameplay scenarios. The constant appearing (referred to as ‘spawning’) of the hot thoughts is controlled by a coroutine called SpawnIntrustiveThoguht(), as seen in Fig. 9. The function takes a string input (‘situation’) which determines, through a switch statement, which of the three trigger thoughts (button objects) to spawn. The booleans called ‘doSpawn’ are set to false (depending on the switch case), ensuring that no duplicate buttons are created.

Fig. 9
figure 9

The SpawnIntrusiveThought() function

The execution of the function is suspended by the float values called ‘bubbleTimer’, which are random float values within specified ranges (Fig. 9). Then, the IntrusiveThoughts() function is called, which activates the specified button GameObject, places it at a random position on the screen, and displays one of the pre-set trigger thoughts in their text component. Lastly, the ‘autoSymSeverity’ is accessed from the script of either Scenario 3 or Scenario 4 (depending on which is the current scenario), and is incremented by one (e.g. ‘scenario3Script.autoSymSeverity++’). As a result, the autonomous physical symptoms worsen every time a trigger thought appears – the idea being that a person suffering from SAD would become more and more anxious the more trigger thoughts they had.

4 Evaluation methods

4.1 Study design

This study was based on three different iterations (study 1, study 2, and study 3), all included within a formative evaluation framework.

The foundation and basic elements of the serious game were the same for all three studies. However, minor changes (based on the formative evaluations) were made to the game content from study 1 to study 2, and from study 2 to study 3. The aim of study 1 was to select and evaluate the scenarios in the serious game (n = 28). Study 2 was an evaluation study with the main purpose of selecting the characters, story, and evaluation setup (n = 15). Study 3 was the final user evaluation study (n = 28). In total, this study involved 71 university students, all within the 18–31 age range.

For all three studies, participants were first asked to carefully read and sign a provided consent form. They were then asked to answer a few background questions (age, gender, field of study, game genre preferences, and the degree to which they considered themselves gamers). Before starting the actual game, detailed instructions on how to play the game were provided. We provided all participants with anonymized ID numbers, and all the data were labeled with these IDs. The data information was kept in an encrypted database. For ethical reasons, during recruitment, we did not ask if the participants were diagnosed with social anxiety disorder. We applied special ethical considerations for the interviews, data analysis [28], and a specific checklist for research-related data processing from the university. Legal access, permission, and consent were obtained. Furthermore, we applied very special considerations, as the participants could potentially be exposed to sensitive topics by playing through a serious game that attempts to emulate anxiety-provoking situations and the feeling of having SAD. The special considerations were implemented by following guidelines within sensitive interviewing techniques [29, 30], as well as by providing a relaxed atmosphere [29, 30].

4.2 Study 1: Scenario selection

Study 1 included 28 participants (18 males; 10 females). The aim of study 1 was to validate the game scenarios [31]. The player experienced a story world within a university context through the eyes of Thomas, a university student who has SAD. There were three different scenarios that all included different social anxiety-provoking events. The scenarios and specific elements of social anxiety were developed based on both a literature review [1,2,3,4,5, 9, 12,13,14] and collaboration and co-design with a chief psychiatrist with 30 years of expertise in social anxiety. The initial design planning included several interviews with the chief psychiatrist, and the three scenarios were mainly based on his expertise. The three scenarios were classroom, cafeteria, and group exam settings, in that order.

4.2.1 Procedure

Study 1 followed an explanatory sequential mixed-method approach with psychophysiological methods (galvanic skin response and heart rate, measured by Mionix Naos QG), followed by interviews with included card sorting. The pilot testing with the included mixed methods was used to determine and implement successful game play scenarios [31]. Twenty-eight students participated in the pilot testing, and arousal (low, medium, and high level) was measured using the psychophysiological methods [31]. The interviews with card sorting were used as a way for participants to talk about their emotional states within specific game elements in the scenarios [31]. Study 1 revealed that all participants had an easy time making sense of in-game events. Understanding of the characters was also clear, and participants easily recognized the storyline. Scenario 4 (falling over with a tray of food in front of a crowd of people in the cafeteria) was the scenario for which the strongest reactions (highest arousal) were reported among the players in study 1 [31]. However, we had to exclude the developed exam scenario due to low arousal and a low number of participants who recalled the scenario [31].

4.3 Study 2: Gameplay, story, and evaluation

The purpose of study 2 was to select the characters, story, and evaluation setup. Study 2 included 15 participants (13 males; 2 females).

4.3.1 Procedure

The procedure for study 2 followed the overall design as outlined in Fig. 10. First, preliminary information was gathered, including the informed consent and information about age, gender, field of study, and game genre preference. This was followed by a few knowledge check questions to determine if the participants had any prior knowledge of CBT skills. After this, the participants could play the game. The evaluation was divided into two evaluations. In evaluation 1, the aim was by a knowledge check, to identify potential knowledge of CBT skills. This included if participants could recall the intended knowledge of CBT skills. In evaluation 2, the aim was to identify whether players found the game and its narrative engaging.

Fig. 10
figure 10

Procedure followed for evaluation and learnings of CBT skills

Evaluation 1 took place in a quiet room in an effort to minimize external distractions that might otherwise influence the participants’ engagement with the game. After the game, self-reporting, with triangulation of a questionnaire and interview, was used. The questionnaire consisted of 12 Likert items from the User Engagement Scale, short form (UES-SF) [18], and 7 Likert items from the Narrative Engagement Scale [20]. The rating scale of the Narrative Engagement Scale was changed to a 5-point Likert scale to match the UES-SF. The participants were interviewed about their engagement with the game and its narrative, respectively, following a semi-structured interview guide. Their overall experience with the game was also covered. The interview guide consisted of six overall themes: 1) Thoughts about the game design. 2) Potential changes to the game design. 3) Most worthwhile game experience. 4) Least worthwhile game experience. 5) How are the CBT skills included (positive/ negative). 6) Most effective CBT skills/scenario.

In evaluation 2, a questionnaire was used with nine question items, consisting of three multiple-choice questions and six open-ended questions. Each question evaluated participants’ ability to recall information (knowledge check) about the CBT skills implemented in the game. The questionnaires from both evaluations 1 and 2 were analyzed by cumulative frequency. The interviews were analyzed by traditional coding [32] following four steps: organizing, recognizing, coding, and interpretation. The interviews were transcribed verbatim to be organized and prepared for data analysis. The transcriptions were read several times by two researchers to recognize the concepts and themes, which also included a general sense of the information and an opportunity to reflect on its overall meaning. Researchers then coded and labelled the data in categories/subcategories, followed by interpretation.

4.4 Study 3: Final user study

Lockdown due to the COVID-19 outbreak began halfway through study 3, which resulted in changing the original in-person lab setting to an online evaluation. A total of 28 participants were recruited for study 2 (16 males; 12 females).

4.4.1 Procedure

The procedure in study 3 was similar to that outlined in Fig. 10 and included the two evaluations and items used from the Narrative Engagement Scale and UES-SF. The main difference was that the evaluation in study 3 took place online. The main focus of study 3 was to make a final user evaluation study, based on changes revealed from the findings in study 2. The content and scenarios in study 3 were made in co-design with a psychologist with expertise in social anxiety. The co-design with the psychologist was also to secure ethical approval for the evaluation procedure with a special focus on no risk for the participants.

5 Results and discussion

The serious game was successful in terms of absorption, attention, aesthetic appeal, and being worthwhile. Results from both studies 2 and 3 were similar, with highly positive feedback on the game (Table 1). We only used frequency and standard deviation as a means to reveal the results. For discussion, further statistical analysis could potentially be used (e.g., Mann-Whitney U test, two groups, nonparametric data). However, there are several reasons for not including statistical significance. First, we wanted to be generally careful about overkill in statistical analysis within this exploratory study of serious gaming. There could be major challenges in both the specific tool and in the interpretation. This study was more of an exploratory study with much focus on whether the use of serious gaming can increase awareness of CBT skills, and not as such a (null) hypothesis study. For proper significance testing, comparisons of hypotheses should be conditional on the data, which was not the case in this lab and online study. The variability was rather large in this study (and in many other serious gaming studies), meaning that it was a challenge to interpret strict statistical values (e.g., p value). The most important question here was not the size of the gaming effect, but rather if there was any perceived awareness of social anxiety disorder and CBT skills.

Table 1 Questionnaire results, study 2 and 3

5.1 Attention, usability, and aesthetic engagement

The focused attention (Q1–3, Table 1) was higher than expected in both studies 2 and 3, as the immersive qualities were not prioritized in this game. Instead, the focus of this game was to deliver its message by explaining different CBT skills. The focused attention was also revealed in the interviews. Some of the participants specifically stated having experienced a sense of immersion in the game. A couple of participants felt that their level of engagement remained the same throughout the game because

“it was a very coherent experience… I don’t think it lost my attention actually” (ID 1, male, aged 23, Study 3).

“it was very simple, but helped to get the message through...and by that the game was immersive” (ID 3, male aged 26, Study 2).

It is also interesting that in the interviews, several participants mentioned that they became more absorbed in the game during the game play, also due to an interest in the narrative. However, some participants mentioned that the recap at the very end of the game was a bit disengaging and lost their attentional focus.

It is interesting that in study 3, we found differences in terms of game genre preferences and their correlation with focused engagement (Q1–3). Participants who preferred the action genre (n = 7) rated their engagement lower (mean 2.8) than did participants who preferred role-playing games (n = 6) or adventure games (n = 4) or described themselves as “non-gamers” (n = 8). Participants who preferred role-playing or adventure games and non-gamers had a higher aggregated focused engagement score, respectively 3.5, 3.3, and 3.4. Given the nature of the action genre (fast-paced, challenging, reliant on hand-to-eye coordination), it is unsurprising that participants who preferred that genre were not equally engaged, as the current implementation favors role-playing elements over action elements (e.g., exploration, dialogue). This is further emphasized by the results of the Likert item “This game appealed to my senses” (Q5, Table 1), for which the “action” participants scored 2.3, while both “non-gamers” and “role-playing gamers” scored 3.5. The findings reveal the importance of considering the target group. It is not possible to develop a one-size-fits-all game. Instead, there is a need to gain knowledge (analysis) of the target group, as well as to be consistent regarding the genre and style.

Participants rated the game’s perceived usability above average in both studies 2 and 3. However, it is interesting that the usability was rated lower in study 3 than in study 2, despite improvements to the game. The median and mode values were the same (3.5 and 4.45, respectively) in study 3. The standard deviation for this category was also high, at 1.2. This suggests that several participants experienced some degree of usability problems in study 3. For study 3, the game play was slightly more advanced, and more scenarios were developed (see Section 3). A number of usability issues were also reported in study 3. Some participants (IDs 3, 15, 20) were confused about the camera’s shaking effect (Fig. 7). Other participants felt that it was not obvious how the mindfulness training worked and whether it was mandatory to continue the various scenarios in the game. In study 3 in particular, some bugs were also reported (the game stopped/had to restart), as well as some problems with how to continue the dialogues.

The lower perceived usability score in study 3 (the decreased score from 4.6 in study 2 to 3.2 in study 3) can also be explained by the higher number of participants in study 3, the fact that participants in study 3 had more diverse backgrounds, and the fact that 8 participants did not consider themselves gamers, compared with only one in study 2.

The game’s aesthetic appeal included the participants’ views on the game’s visual and graphic design. In both studies 2 and 3, the game’s aesthetic appeal was rated high (Q5–6, Table 1). Most participants’ comments on the aesthetic appeal in the interviews were about the game’s graphical fidelity, with positive remarks:

“Visually, it was good. When you talked with someone, they turned to you, which was nice. [...] I do not think it was necessary for the graphics to be more realistic” (ID 19, female, aged 23, Study 3).

Few participants disliked the aesthetics; dissatisfaction was related to the aesthetic of the camera shake function.

Several participants also commented on the game’s audio design. Most participants liked the ambient sounds and other sound effects, as it helped “make the game more immersive” (ID 3, female, aged 20, Study 3) and helped them “get drawn into the scene” (ID 19, female, aged 23, Study 3). Voice action was implemented in the developed game in study 3, based on some critiques of the game’s audio and sound from study 2.

5.2 Worthwhile playing, the narrative, and knowledge check

One question (Q7, “Playing this game was worthwhile”) was a bit difficult to interpret, as this question is slightly generic, and due to the online testing for study 3, we were not sure in what context the participants were playing. However, it is interesting that participants rated the game a bit higher in terms of being worthwhile when seated in a non-laboratory environment, where the only focus was the game.

The participants rated narrative understanding in study 2 (Q8–9) very highly, with a score of 4.8. The median and mode values were both 5, and the standard deviation was low, at 0.45. The high score (and the majority of answers being 4 or above) indicates that participants had an easy time understanding the narrative, which is imperative to avoid players’ losing interest [20]. From the in-depth interviews, there were generally positive comments about the story being “a clear narrative” (ID 3, male, aged 26, Study 1), or,

“I think it was easy to understand….And I think everyone could have someone they know who has these kinds of issues and could relate through them” (ID 8, male, aged 21, Study 2).

“I felt that the story had a natural progression” (ID 1, male, aged 23, Study 3).

There were only a few suggestions for improvements to the story, mainly regarding the introduction to Sam, the non-player character, and that the game at some points was a bit too tedious.

The knowledge check, with the aim of measuring if participants could remember the provided CBT skills and information, was conducted immediately after the game play session. The results from the evaluation 2 questionnaire revealed an aggregated mean of 47.5% correct answers for study 2 and 66.4% correct answers in study 3 (Table 2). The number of correct answers in study 2 was a bit lower than expected (objective: 60% correct recalls). The low accuracy score in study 2 can be explained by a potential lack of motivation for the game (caused by target group, genre preferences, and sampling), as well as reconsiderations of changes in the game play with better (high arousal) elements for specific CBT information. Previous studies have already outlined how stimuli producing high arousal are remembered better than stimuli producing low arousal [25, 33]. The measured high arousal elements could be implemented through both different design elements (light and sound) and a potential increase in how challenging the game is. Furthermore, the game could be developed to allow for increased personalization. A few changes were implemented in the game in study 3, focused on improving the number of correct recalls. Therefore, the recap at the end of the game was included. Despite some critique toward the recap during the interviews (e.g., being boring), it might have helped promote better recall.

Table 2 Recall results, study 2 and 3

However, there were also some differences among the different elements asked about. The CBT information was gathered within three overall CBT skills: cognitive restructuring, mindfulness training, and exposure therapy (Table 2), which were related to the different scenarios in the game play. The highest accuracy score was within cognitive restructuring (Table 2). The lower scores within the mindfulness training and exposure therapy were expected, as per evaluation 1, they were found to be less worthwhile to experience compared with the cognitive restructuring.

5.3 Suggestions for improvements

In general, the usability was excellent, but some participants experienced technical challenges inherent to the game. These included a few bugs where the game stopped running smoothly, screen resolution difficulties, and sound quality. However, all these elements could be avoided in a more controlled environment, with the participants using the same laptop and high-resolution screen. This would also include a more consistent control of participants having the same contextual experience (played in the same environment), without any negative external effects on the players’ experiences. On the other hand, an online evaluation can potentially include a higher number of participants.

There were few, but good, suggestions for improvement from the participants. The most common suggestion was to personalize the game, with options for the players including, for example, further scenarios, choice of single-player/multiplayer game, change of characters, and inclusion of personal life experiences. Some participants made suggestions for the mindfulness training scenario (scenario 2) in particular. Participants remarked that they felt that what they were meant to do (click on the trigger thoughts as they appear) was not clear enough. Furthermore, some participants believed the trigger thoughts appeared too quickly and suggested making the rate at which they appear more gradual, by starting slow and then speeding up (or vice versa).

Particularly, some of the participants who preferred the action game genre provided suggestions for more action in the game, such as changing the mindfulness training scenario so that you have to shoot down the trigger thoughts instead of simply clicking on them. In contrast to this viewpoint, there was a suggestion for making the gaming even more passive, and providing the gamers with further mental spaces with time for reflection (by, e.g., ignoring the trigger thoughts).

6 Conclusions and future research

The designed serious game, with the aim of engaging university students in raising their awareness of CBT skills associated with social anxiety disorder, was successful in terms of focused attention, perceived usability, aesthetic appeal, being worthwhile, and narrative understanding. Elements were perceived clearly in the game, and it was easy for the players to understand. The narrative had a fitting story with a clear and understandable narrative. Participants also made very positive remarks about the game quality, as well as the societal aim and purpose.

However, the designed elements for emotional engagement could be improved (e.g., by allowing more in-game personalization with the inclusion of the users’ own life experiences). The freedom for further personalization could also enhance the game’s narrative, as players could play as themselves instead of identifying with another character and his/her problems. The in-game personalization could also be improved through gamers’ own choice of the characters.

The evaluations suggested that the designed serious game could be more challenging. However, within an aim of transferring information about CBT skills and SAD, a balance must be struck between designing and implementing a serious game that provides multiple stimuli (e.g., aesthetic appeal, narrative understanding, complex interactions, and sound and visual effects) and at the same time mental focus on learning information. In terms of the objective of a minimum 60% accuracy score in the post-game knowledge check, we were successful in study 3. The participants were only able to accurately recall information from the game in study 2 with less than 50% correct answers. This accuracy score was lower than expected, which could also be due to the sampling methods and/or too little game design focus on the information that should be recalled. Therefore, in study 3, the emphasis was placed on improving the game so that participants could recall information to a higher degree. This was achieved in study 3 mainly by the inclusion of an introduction and a recap. The correct number of information recalls in study 3 was 67.6%, a 20% increase from study 2.

Engagement and awareness are rather complex to measure. Therefore, there are excellent validity and reliability reasons to use different methods to increase the number of different data sources. Furthermore, there is also potential in using psychophysiological measures to supplement the much-used self-reporting within the evaluation of serious games. Psychophysiological measures were useful within this study with a go/no go decision of scenarios, as an emotional engagement was also included in the serious game. However, substantial work is required within the research design to set up useful psychophysiological measures, and much further work is needed regarding how psychophysiological data could and should be interpreted when used in the design of serious games.

Immersion and sense of presence are two important factors in creating believable scenarios to elicit social anxiety. HMDs/VR could also be used to provide the expected increase in emotional engagement. However, it is also worth noting that VR has limitations in terms of potential locomotion sickness, lack of real-world vision, unnatural head movements, and a more complex setup for the users.

Future work is needed to generate significant evidence and insights regarding students’ learning of CBT skills via serious gaming. First, a much higher number of participants is needed, and control groups should be included in the research design. Second, further details on the identification of gamers are needed (e.g., their confidence in serious gaming and game genre preferences). It is important to emphasize that there is no established taxonomy of serious gaming, and serious games are still diverse in their outcomes and certainly understudied as a means to provide knowledge about SAD. It would also be interesting to create different options in the game design for target groups other than students, as well as to make the game more personalized with the inclusion of the participants’ own life stories.