1 Introduction

Speech anxiety, also known as fear of public speaking or Glossophobia (Hancock et al. 2010), is identified as a social phobia or Social Anxiety Disorder (SAD) that is one of the most prevalent world fears, affecting approximately 75% of the population (Rahmawati et al. 2018). It has been shown that fear of public speaking can affect a speaker both physiologically (e.g., dry mouth, increased blood pressure, sweating) and psychologically, fearing humiliation or embarrassment, and that others will judge them negatively (Pertaub et al. 2002; Kushner 2010). Early research has shown that this kind of anxiety can be attributed to various factors, including public speaking skills, language fluency, context, or individual characteristics (Beatty et al. 1989). It is therefore understandable that individuals who suffer from speech anxiety require support in changing their response to situations that may cause social anxiety.

Oral communication is one of the most important forms of public speaking and is consistently rated as one of the most valued workforce skills (Kyllonen 2012). Previous research identified a number of rubrics for a baseline assessment of public speaking proficiency (Quianthy and Hefferin 1999; Morreale 1990; Thomson and Rucker 2002; Lucas and Stob 2020). Specifically, public speaking performance is assessed based on a number of competencies that span both verbal and non-verbal aspects. However, whilst it has been observed that there is an expected overlap of competencies assessed across these rubrics, most of the work and identified competencies are around improving verbal communication skills. Previous work on improving non-verbal communication skills is limited. Here, we specifically draw from the work proposed by Lucas and Stob (2020) and Thomson and Rucker (2002) who present key competencies that specifically pertain to non-verbal communication.

In recent years, the use of Virtual Reality (VR) as a method of treating anxiety disorders has become more prevalent, as its use as an aid to healthcare provision has rapidly been developing (Greenleaf 2016). Given its capability to gradually expose individuals to virtual equivalents of real phobias and to eliminate many real-world constraints, it can act as a valuable method of treating social phobias such as speech anxiety (Anderson et al. 2013). Although there are reports about the use of VR for social anxiety disorders, there is limited work in the use of VR for non-verbal communication. Similar commercial solutions are currently available, however, these appear to be primarily targeted at enterprise usage, which whilst helpful, it also means that they are not easily accessible to regular users who need such support. The increasing reduction in cost of VR equipment, which previously hindered implementation of such solutions, combined with the capability that VR offers to carefully monitor and tailor a virtual environment to an individual’s needs (American Psychological Association 2017) is expected to present an important opportunity for such VR solutions to be more widely adopted for the intended purpose despite restrictions and onpoing challenges such as cybersickness (Tian et al. 2022) or cyberattacks (Odeleye et al. 2022), which can have an impact to user experience. Accordingly, in this work we present an evaluative study of a VR intervention with instruction-based live-feedback for use in adults with speech anxiety who need support with their non-verbal communication skills. The primary aim of this study was to determine any efficacy in positively changing perceptions towards factors affecting non-verbal communication anxiety. This paper is structured as follows. Section 2 presents the background to this work, whilst Section 3 discusses the design and implementation approach and decisions. Section 4 then presents the evaluation study and findings. Finally, Section 5 identifies the main findings, contributions and limitations in this work.

2 Related work

This section presents related work on social anxiety disorders, factors affecting public speaking, as well as treatment approaches related to social anxiety disorders with a particular focus on VR and Exposure Therapy methods.

2.1 Social anxiety disorders

Social anxiety disorder is characterised by the persistent fear of scrutiny by others due to a belief that such scrutiny will lead to negative evaluation and rejection (Kashdan et al. 2013). Whenever possible, people with SADs will attempt to avoid their most feared situations. However, this is not always feasible, and they will be required to endure the situation, often with a feeling of intense distress. Social anxiety is one of the most common of all anxiety disorders, with reports showing lifetime prevalence rates of up to 12%; in comparison to estimates for other anxiety disorders, 6% suffer from generalised anxiety disorder, 7% for post-traumatic stress disorder (PTSD) and 2% for obsessive-compulsive disorder (OCD) (National Institute for Health and Care Excellence 2022).

People who suffer from social anxiety experience intense and persistent fear of drawing attention to themselves in social situations, worrying that their flaws will be exposed (Kashdan and Farmer 2014). This is also referred to as self-focused attention, which has been defined as “an awareness of self-referent, internally generated information” (Ingram 1990). For instance, whilst present and/or on display in front of an audience during a public speech, individuals with social anxiety will exude extremely high levels of self-focused attention, limiting their ability to identify and absorb external cues from the audience. This negative self-focus can detrimentally affect their self-confidence and speech performance. In the case of highly anxious speakers focusing less on external cues, it has been shown that they cannot quickly adapt to changing situations and audience reactions (Daly et al. 1989). This negative performance can lead to future public speaking anxiety as the individuals fear that they will under-perform again. It is proposed that those who have speech anxiety do so because they fear that they will act in a humiliating or embarrassing way and that others will judge them negatively (Pertaub et al. 2002). Amodeo (2014) discussed that the fear of rejection is one of our deepest human fears, as it is biologically wired with a longing to belong.

Some cognitive-behavioural models have suggested a correlation between increased self-focused attention and the maintenance of SADs. Clark and Wells (1995) proposed that those with SADs are excessively self-focused during anxiety-provoking social situations, preventing them from noticing external social cues that can disconfirm their negative expectations. However, those with SADs use internal cues in order to evaluate their performance. There is significant evidence to support this theory that individuals with social anxiety report high levels of self-focused attention than individuals without, therefore the correlation can be made between self-focused attention and increased levels on anxiety (Woody and Rodriguez 2000).

Daly et al. (1989) highlighted that the fear of public speaking has also significant social impacts, as individuals who fear speaking before others face many issues when pursuing their career goals and find limited opportunities for promotions. This consequently can lead to considerable personal agony and dissatisfaction (Pertaub et al. 2002). Additionally, Wade (2012) looked into the economic implications of SADs and identified that those suffering from such a disorder had a greater number of workdays missed. This inability to attend work can negatively affect someone’s ability to maintain consistent employment, consequently affecting their income. In this study, we address the issue of self-attention by offering a solution that prompts users to focus more on important external cues, such as eye contact from/with audience and time awareness, through the provision of live instruction-based feedback that points a user to learn to identify and focus on these cues.

2.2 Factors affecting public speaking

The reasons why people with such SADs experience fear of public speaking are numerous. Rajitha and Alamelu (2020) recently found that there is a clear division between each individual’s unique internal and external factors that can cause them to become anxious when speaking publicly. They identified seven primary anxiety factors, four of which were recognized as external factors, incl. language, grammar, pronunciation and peer, and three were seen to be internal factors, incl. stage fear, lack of confidence, and shyness.

More specifically, Rajitha and Alamelu (2020) demonstrated that individuals fear that they are going to be judged negatively by their peers in a public environment, which links back to Kashdan and Farmer (2014) explanation of SADs being caused due to self-focused attention and the fear and feeling of judgement from others. The latter inevitably affects an individual’s self-confidence, as they tend to focus on their self rather than the audience. McCroskey (2015) described that lack of self-confidence in an individual’s abilities and skills, as well as hesitation to engage in communication or interactions, such as speaking in public, are inter-linked due to fear of peer judgement. Fear of public speaking can be further increased when shyness is also factored in, as past research indicated that shy individuals are associated with lower performance due to their reactions to stressful situations (Crozier and Hostettler 2003). This therefore points to the need for a solution that will help increase an individual’s self-confidence.

The peer factor can be broken down even further into the environment in which the peers are present and whether the evaluation the individual is receiving is formal or informal. MacIntyre et al. (1997) found that it was more anxiety-provoking for an individual to speak before an audience who was giving formal evaluation than in the context where the speaker was not evaluated. This can be taken further, therefore, by looking into the appearance of the audience that an individual is presenting before. Mulyani (2018) recently concluded that the formality of the environment plays in fact a role on an individual’s anxiety levels whilst speaking. This peer evaluative nature and associated anxiety (Daly 1997), as well as poor preparation (Daly et al. 1995) have long been considered causes of stage fear, which is an important internal factor of public speaking-associated anxiety.

The number of people in which an individual is speaking before has also been shown to affect their level of anxiety, as the increased audience size contributes towards magnifying one’s self-focused attention (Kashdan and Farmer 2014). Pratama (2018) found that the audience’s capacity plays a significant role on the speaker’s levels of anxiety. Finally, Asakereh and Dehghannezhad (2015) identified that a key aspect of any form of public speaking is the time given to not only practice but also to perform. Paradewari (2017) expanded on this idea by looking into students’ self-efficacy of public speaking showing that when given a time constraint in which the student was required to speak within, they were able to articulate their points more concisely and felt more confident due to the structure that they had laid out for themselves. It has to also be noted that earlier research showed that the level of anxiety associated with public speaking is usually higher during the period of anticipation, i.e. before public speaking begins (Sawyer and Behnke 1999). Work from the same authors further demonstrated that the peak of associated anxiety is at the very end of the period of anticipation i.e. right before public speaking occurs, whilst the second highest peak occurred at the announcement of a public speaking assignment (Behnke and Sawyer 1999).

Whilst there is considerable work into strengthening external factors associated with public speaking, specifically verbal and vocal variables, early work by Mehrabian (1981) resulted in the “7-38-55” rule, which indicates that only 7% of all communication is done through verbal communication, whilst the non-verbal components, such as the tone of our voice and our body language, make up 38% and 55% respectively. For a public speaking performance to be deemed as successful, it’s important that both verbal and non-verbal components are balanced and in alignment (Španjol-Marković 2008; Mehrabian 1981), however, further research discussed that non-verbal communication is often neglected (Pizek Meštrić 2016). This therefore calls for an efficient solution to support the improvement of non-verbal communication.

Accordingly, in this work we particularly address those contributing factors that are seen to influence an individual’s non-verbal communication, namely peer, stage fear, lack of confidence, and shyness through a VR simulation that takes place after an announcement of public speaking and before it occurs. To this end, we designed the VR intervention’s content guided by the targeted contributing factors above (see also Sect. 3.2).

2.3 Social anxiety treatments

There is a number of treatments for SADs, such as public speaking anxiety, including Cognitive Behavioural Therapy and Exposure Therapy. Whilst effective, these treatments also have limitations. Studies by Hofmann and Smits (2008), Hofmann and Otto (2017) and more recently David et al. (2018) have looked into the overall effectiveness of Cognitive Behavioural Therapy treatment concluding that it is indeed an effective method of treating SADs. They further labelled it as the dominant psychosocial treatment and the first-line treatment for many disorders. On the other hand, Holmes et al. (2002) found that Cognitive Behavioural Therapy works well in university-based clinical trials with subjects recruited from advertisements, however, the evidence about how effective it can be in the real world of clinical practice is less secure. However, recent work by Nath Samantaray et al. (2022) showed that Cognitive Behavioural Therapy was effective with COVID-19 related SADs. A meta-analysis of randomized placebo-controlled trials for adults with SADs found that, on average, between 5 snd 20 sessions, each lasting between 30 and 60 min, were required for a patient to experience the benefits of this treatment (Hofmann and Smits 2008). The NHS indicates that the cost of a private Cognitive Behavioural Therapy session can cost between \(\pounds\)40 and \(\pounds\)100 per session (NHS 2019), which makes this a less affordable option. In fact, a significant factor affecting one’s ability to receive effective treatment for SADs is cost, as the price of standard methods of treatment can be too high for some individuals (Marciniak et al. 2005).

Exposure Therapy is another commonly used and effective method of treating SADs whereby an individual is exposed to the specific environment, object or being that they fear and would typically avoid (Boehnlein et al. 2020). It has been shown that gradual exposure to feared objects, activities, or situations within a safe environment helps to reduce fear and decrease avoidance, as the individual becomes more accustomed to being in the situation or environment that they had previously feared (American Psychological Association 2017). A number of studies that looked into the effectiveness of Exposure Therapy have come to overall positive results showing it to be an effective method of treating psychological disorders such as SADs (Sars and van Minnen 2015; McGuire et al. 2014). A recent meta-analysis confirmed the effectiveness and efficacy of Exposure Therapy for specific phobia (Odgers et al. 2022). However, there is still some debate about the methods used within Exposure Therapy with numerous studies making reference to the use of VR technology as a potential tool in aiding this method stating that a “new medium of administration for exposure therapy may be feasible for treating a subset of social anxiety symptoms” (Lin et al. 2019).

2.4 VR for social anxiety disorders

Accordingly, VR has already been successfully integrated into several aspects of medicine and psychology. For example, VR is being utilised for medical training (Manolakis and Papagiannakis 2022), treating eating and body image disorders (Cuzzolaro and Fassino 2018), post-traumatic stress and obsessive-compulsive disorders (van Loenen et al. 2022), pain assessment (Spyridonis et al. 2014; Eccleston et al. 2022), and phobias (Albakri et al. 2022). Work by Schultheis and Rizzo (2001), Anderson et al. (2013), and more recently Maples-Keller et al. (2017), demonstrated the capability of VR to measure behaviour within an ecologically valid environment whilst maintaining a complete level of safety and control, which makes VR an effective method of treatment.

2.4.1 Virtual reality exposure therapy

More relevant to this work, VR has also been extensively used as Exposure Therapy for numerous types of phobias (Carl et al. 2019; Botella et al. 2017) showing its efficacy in reducing fear of real-world equivalent phobic stimuli (Morina et al. 2015). A recent systematic review on Virtual Reality exposure treatment demonstrated its effectiveness for most phobia types, including social phobias (Freitas et al. 2021). Through Virtual Reality Exposure Therapy, a patient can be gradually exposed to the scenario or environment they fear; however, they do this whilst immersed within VR (Boeldt et al. 2019). This makes this approach a safe place to gradually push the limits of one’s anxiety disorder, which is an essential process throughout Exposure Therapy (Craske et al. 2014). In fact, Maples-Keller et al. (2017) identified that the use of artificial settings and virtually generated environments eliminates many constraints of the real world and therefore acts as a valuable method of treating SADs. Previous work by Robillard et al. (2003) demonstrated that the graphic quality of game-based environments can often be superior, which leads to a maximised effect of exposure. For example, a recent study by Hinojo-Lucena et al. (2020) showed that VR treatment for public speaking anxiety in a student population was effective while being less invasive than in vivo exposure, and a similar study by Lindner et al. (2021) produced similar positive findings in support of Virtual Reality Exposure Therapy for public speaking anxiety, this time in routine care. It has also been shown that this approach had a low dropout rate in a recent meta-analysis study (Benbow and Anderson 2019). Boeldt et al. (2019) further discussed that the use of VR can allow therapists to choose and cater content to the personal needs of patients, and that Virtual Reality Exposure Therapy is at the very least as effective as the state of the art treatment that is carried out in person, concluding that although VR may not currently be a standardised treatment method, it can make Exposure Therapy more effective. The above can significantly help with the increased levels of anxiety and depression that have been observed as a result of the recent COVID-19 pandemic (World Health Organization 2022). Accordingly, in this work we adopt the Exposure Therapy approach for treating SADs.

2.5 Research aim and approach

To this end, the aim of our work is twofold; first, work discussed above indicates that individuals with SADs exhibit high levels of self-attention when they are in an anxiety-provoking environment or social situation. In Virtual Reality Exposure Therapy, the sense of presence has been considered the main mechanism that leads to the experience of anxiety (Wiederhold and Wiederhold 2005). Accordingly, we aim to explore the ability of a Virtual Reality Exposure Therapy solution to induce a sense of presence with the working hypothesis that higher levels of presence successfully simulate an anxiety-provoking environment where individuals exhibit high levels of self-attention. Second, and building upon our first aim, we investigate the efficacy of our Virtual Reality Exposure Therapy solution in positively changing perceptions of those factors that influence non-verbal communication in adults living with self-reported speech anxiety through instruction-based live feedback cues.

3 Design and implementation

In this section we discuss the design and implementation of our proposed Virtual Reality Exposure Therapy intervention showing how factors affecting public speaking have been mapped to specific mechanics in our solution to stimulate an environment prone to public speaking anxiety conditions. It has been shown that any virtually generated environment needs to be capable of being personalised to an individual user’s needs (Harkness and Lilienfeld 1997). As such, our solution also offers instruction-based feedback based on user actions whilst shifting user attention from one’s own self to the audience.

3.1 Design framework

The goal of this VR solution is to support non-verbal communication skills in adults with speech anxiety. As the goal suggests, VR is the technology used for this solution. The factors affecting non-verbal communication need to be implemented in the story, environment and aesthetics of the VR solution. As suggested by previous work (Daylamani-Zad et al. 2016), the closer a setting is to a real-world scenario, the easier it will be to achieve the Lusory goal, which is the serious purpose of the solution. The virtual environment therefore needs to be as close to reality as possible, achieved through the inclusion of realistic entities, which can further allow for improved engagement (Sutcliffe and Gault 2004). To this end, the high-level requirements of the solution are developed according to existing research on VR interventions, serious games and factors of non-verbal communication affecting public speaking. These are grouped under three categories: Training—features required to train the user in factors of non-verbal communication, Experience—features required by the user to take advantage of an efficient solution, whilst feeling safe, and Technology—technical requirements to enable the system to address the requirements of training and experience. These are summarised in Table 1. The technology category includes features that the HMD and the code would need to handle. These include requirements such as the users’ posture, seated or standing, as it relates to the Shyness factor as described in Table 2.

Table 1 High-level requirements for a VR solution to support development of non-verbal communication affecting public speaking

These requirements led to the identification of the core concepts and their required features. To create an efficient realistic simulation which provides suitable public speaking scenarios, our VR solution is designed within a story where the user is required to speak in front of a public audience in a boardroom or an auditorium setting. The goal of the solution is implemented using a presentation mechanic where the user’s task is to present to the selected audience in one of the above settings. This is achieved using a simulated slide deck presentation on a PC, so that it closely resembles a person giving a public presentation. The VR environment needs to also include information about the task, public presentation, and suitable ambient and environment sounds and information about the simulation, as well as potential instruction-based feedback guidance on the field of view to help the user understand what aspects of the identified non-verbal communication factors they should focus on improving. In order to achieve the simulated realism necessitated in the requirements, the simulation needs to include graphical assets (environment, characters, slides, slide deck presentation, etc.) which closely represent a real-world scenario. They need to be close to realistic graphics to enhance situational awareness in the user and create a sense of presence (Agius and Daylamani-Zad 2019).

3.2 Implemented public speaking factors

We previously discussed that there are contributing factors seen to influence an individual’s non-verbal communication skills (Sect. 2.2). Accordingly, we designed our VR solution guided by these factors to map them to suitable VR mechanics. Specifically, and in line with our second aim to induce a high sense of presence, we mapped mechanics and aesthetics to the identified contributing factors in order to create a suitable solution that creates a realistic simulation which addresses these factors. Table 2 provides the mappings of the factors addressed in this solution to VR mechanics suitable for the design of this solution. Each factor is implemented using a mechanic which fits well within a realistic VR simulation in order to increase presence and provide a close to real-world experience.

Table 2 Design components mapped to the factors based on the considerations identified in the literature

3.3 Environment and interactions in the VE

The chosen scenario and environment offers the user the opportunity to practice their non-verbal communication skills when delivering a public speech within two different types of room—a boardroom and an auditorium. This addresses peer anxiety and accommodates the level of formality of each of the above environments, as it has been shown that the formality of the environment plays a significant role in one’s public speaking anxiety (MacIntyre et al. 1997). The two rooms also facilitate believable and relatable different audience sizes.

Accordingly, we used an Oculus Rift S with a resolution of 2560\(\times\)1440 and an 80 Hz refresh rate. The solution runs at an average of 60 fps with a low of 50 fps with a high population size. The VR solution employs a real-walking locomotion technique whereby users can walk freely, if they choose to, inside the limited physical space (pre-defined safe play space; two steps to left, right and back of the simulated laptop) of the boardroom or the auditorium VR space. Therefore, the user is not required to explore any large areas inside the environment apart from using natural head movements for observation, audience interaction and User Interface selection.

On the start of the simulation, the user starts in an empty boardroom environment, which acts as a lobby for the experience and it is used to allow the user to adapt to the environment. It is also used as the navigation point to initiate the specific experiences. The user is first presented with a floating menu incorporated into a TV screen (Fig. 4). On choosing ‘Start’, the user can choose their target room where they will prefer to start practicing (Auditorium/Boardroom). The user then chooses their preferred practice conditions with variations in formality (informal/formal), capacity (low/medium/high) and the length of the presentation/practice (5/15/30 min) (Fig. 5). Figure 1 visually depicts the above as a flow diagram.

Fig. 1
figure 1

System flow

The user is then loaded into the chosen practice environment with their chosen practice conditions. Both environments include key VR mechanics that have been mapped to factors affecting non-verbal communication (see Sect. 3.2)—a wall clock to display and represent the chosen time condition and the audience whose attire matches the formality of the chosen environment and conditions. The user is also provided with a laptop object that allows for uploading presentation slides that can be used within the VR simulation (Figs. 2 and 3).

The avatar audience is randomly seated in the chosen environment in each VR simulation run, so that the user can practice in a different audience layout. Semi-realistic, animated, 3D avatars were used for the audience in line with the need to include realistic entities to achieve a simulated realism and increase engagement (Sutcliffe and Gault 2004). The representation of each audience avatar member differs for each environment, depending on setting selection, including the clothing that they are wearing as well as their postures. This is done to show a clear representation of the different formalities within each environment. Specifically, the user can select the boardroom or the auditorium settings, and if they then choose to practice in formal conditions, individuals in the audience will be wearing suits and shirts and paying close attention to the user (Fig. 6), whilst if they select informal practice conditions, then the audience is dressed more casually (Fig. 7). The animations of avatars in the formal setting are selected to present more formal postures and to seem they are listening intently, whilst the animations of the avatars in the informal setting are more relaxed and may have frames where the audience is not intently focused on the speaker. For example, the animations in the informal setting include frames where the avatar may lose eye contact, but the animations for the formal setting do not. Formality is therefore used as a behavioural representation of the avatar audience (e.g., posture), as well as a reference to the avatar attire (e.g., suit vs casual) and the room setting, i.e., a boardroom is a more formal setting compared to an auditorium which is typically more informal.

Fig. 2
figure 2

Boardroom concept visualisation

Fig. 3
figure 3

Auditorium concept visualisation

Whilst delivering their presentation, the user is able to receive live instruction-based feedback relevant to their non-verbal communication performance based on their gaze, posture and position. The user is expected to stand during the presentation, therefore if the simulation detects that the user is seated, based on the head height detected in the simulation, they are shown a message reminding them to stand up straight. As previously mentioned, this is to address the posture detection mapping for the shyness factor (Table 2). We acknowledge that some users may tend to take a seat and are hesitant to stand. Therefore, if the simulation detects that the user is seated, it will remind them to stand. The simulation detects someone as seated based on the head height as detected by the HMD with reference to the simulated laptop. If the top of the head is detected to be parallel or below the top of the laptop screen, it is considered a seated position. This approach, however, does not cater for users with lower-than-average height or wheelchair users.

Accordingly, if the user’s gaze is averted from the audience during the presentation, they are shown a message to encourage them to look at the audience, show confidence and share their gaze so that they are not focused on a specific group or area (Fig. 9). The VR simulation also provides positive encouragement by showing messages when the user is doing well so that they feel reaffirmed and connected to the simulation, which helps address the lack of confidence factor. For example, the displayed feedback provides the user with live presentation cues, such as “maintain audience focus” if they are staring at the floor (Fig. 8), which can also help them develop more positive habits (de Bruijn et al. 2009), as well as with positive reinforcement cues, such as offering praise, which has been shown to result in increases in confidence which then results in commitment to change practice (Lucero and Chen 2020). It has to be noted that in this scenario there is an intentional timer before displaying feedback to allow the user to look around freely, so cues are only displayed if they are not focused on the audience for extended periods of time, e.g. longer than 45 s. The audience area, based on distribution, is divided into 3 regions of right, left and center which is used to detect the user’s gaze. Each region has a box collider which defines the region. We use a Raycast from the user’s head position going forward. If the ray does not hit any of the three colliders for more than 25 s, we detect that the user is not looking at the audience and encourage them to maintain audience focus. If the ray lingers on any single region for more than 45 s, we detect a fixation and encourage them to shift their focus to the other regions of the audience.

Once the presentation is finished, the user is taken back to the lobby and is shown a breakdown of the cues they were shown and the timings of it so that they can revise and review in order to improve. This includes the positive encouragements, which act as rewards that positively impact their confidence.

Fig. 4
figure 4

Main menu incorporated into a TV screen

Fig. 5
figure 5

User Interface allowing the user to choose their specific settings including, location, formality and length of presentation

Fig. 6
figure 6

A formal boardroom with medium size attendees for a 30 min presentation

Fig. 7
figure 7

An informal presentation in an auditorium for 5 min with a medium sized audience

Fig. 8
figure 8

Message to the user, when they are avoiding eye contact, to maintain audience focus

Fig. 9
figure 9

Message to the user to share their gaze with all the audience and not to focus on a certain area of the presentation room

4 Evaluation and findings

The hypothesis being tested in this work is that PublicVR is an effective intervention that positively changes an individual’s perceptions towards those non-verbal communication factors that affect public speaking anxiety. Accordingly, an empirical user evaluation was carried out to validate the ability of PublicVR to:

  1. 1.

    Positively change participants’ perception of anxiety toward the implemented factors measured through a change in an individual’s anxiety level stemming from these factors;

  2. 2.

    Induce a sense of presence in our VR measured through the participant’s subjective experience when interacting with PublicVR.

4.1 Procedure and instruments

The user evaluation involved participants playing the VR simulation and was designed to have three stages; (i) the briefing and screening stage, (ii) the user study stage, and (iii) the presence assessment stage.

In stage (i), all participants were briefed about the purpose of the study and they underwent an initial screening where their perception of their anxiety level towards public speaking was self-reported before experiencing the VR simulation, using the Personal Report of Public Speaking Anxiety (PRPSA) (McCroskey 1970), which effectively helps to determine an individual’s fear toward public speaking. Participants who scored above 98 on PRPSA were considered as having a self-reported moderate to high anxiety (McCroskey 1970), and were therefore eligible to be part of the Moderate/high anxiety group. Participants with a self-reported low anxiety (i.e. below 98 on PRPSA) did not meet our anxiety threshold criteria, and were therefore put into the Low anxiety group (Table 4). This first stage lasted approximately 20 min. We hypothesise that if eligible participants’ self-reported perception of their anxiety level towards public speaking decreases (i.e. below 98 on PRPSA) as a result of using the VR simulation, then their perception of their anxiety related to the implemented non-verbal communication factors at this particular moment in time has successfully changed.

In the second stage (stage ii), we followed a within-subject design (both groups went through all the conditions), so that we can compare the effect of PublicVR on the participants with reported moderate to high anxiety against the participants with reported low anxiety. Initially, participants were asked to complete a demographics questionnaire and to then use the VR simulation freely to familiarise themselves with the content and interaction mechanisms. Next, they were assigned to the test scenario consisting of three five minute presentations within a) a low audience size in an informal setting, b) a medium audience size in an informal setting, and c) a high audience size in a formal setting. The audience size for each location is presented in Table 3 and is in line with reported large (more than 35), medium (9–35), and low (1–8) audience sizes in similar settings found in previous studies (Anderson et al. 2013; Lemasson et al. 2018). We used these ratios and applied them to the capacity of the boardroom, i.e. 14 seats around the table, to assign the audience size for the boardroom. The location for each presentation was randomly assigned between auditorium and boardroom. We kept the length, formality and population of each scenario the same for all participants so that the result can be comparable. It also follows a build up from less stressful to more stressful conditions, as we want to simulate an environment where the user exhibits high levels of self-attention.

Upon arrival, each participant was seated at the laptop with the VR simulation preloaded and put on the HMD. A presentation slide deck was preloaded on the laptop object, which was used by all participants to deliver their public talk. Each participant was given 10 min to familiarise themselves with the topic and the slides which were the first five slides of Principles of EU Environmental Law (European Commission 2022) and included presenter notes to help the participants prepare. Whilst in a test scenario environment, participants were asked to carry out the task of delivering a talk as they normally would using the provided presentation slide deck. As previously mentioned, each participant experiences three different test scenario environments which are different in terms of layout, formality, and capacity. However, the interaction technique used by participants was the same across all test scenario environments. The user evaluation task involved participants trying to deliver their talk within the allocated time limit shown in the clock on the wall, in accordance with the VR scenario. At the end of this stage, participants were asked to again self-report their perception of their anxiety level using the PRPSA. Finally, brief semi-structured follow-up interviews were carried out to further explore participants’ thoughts about using PublicVR for the intended purpose. The user study was carried out in a lab setting in a controlled environment using a pre-configured laptop. Each user test session lasted approximately 60 min depending on the participant.

Table 3 Audience size for boardroom and auditorium for low, medium and high

In the final stage of our user evaluation (stage iii), once participants finished the user task, they were also asked to complete the iGroup Presence Questionnaire (IPQ), which was used to gain insight into their perceived sense of presence when using PublicVR. The IPQ comprises of three subscales—Spatial Presence, Involvement, and Experienced Realism, and one additional general item not belonging to a subscale which assesses the ‘sense of being there’ (Nichols et al. 2000). This final stage lasted approximately 10 min. Figure 10 illustrates the steps and stages in our user evaluation design.

Fig. 10
figure 10

The three stages and their steps for our user evaluation

4.2 Participants

The study protocol was approved by the authors’ home institution Research Ethics Committee (0636-LR-Nov/2022- 42000-1) and was carried out in December 2022. Written informed consent was obtained from all participants before starting the evaluation. The participants were recruited from the student population at the authors’ home institution through a call for unpaid volunteers who had experienced anxiety in public speaking. The study included 25 participants, aged between 19 and 25 (\(M=21.56\), \(\sigma = 1.44\)), which included 18 males and 7 females. Table 4 presents the distribution of participants and the details of their screening PRPSA results.

Table 4 Screening PRPSA result details and the distribution of participants in the Moderate/high and Low anxiety groups

4.3 Analysis and results

The results of the post-task PRPSA illustrate a decrease in the perceived anxiety in the Moderate/high anxiety group and a general decrease in the Low anxiety group, with two slight increases in the outliers. Table 5 presents the distribution of participants and the details of their post-task PRPSA results. The screening and post task results of PRPSA scores across both Moderate/high and Low anxiety groups are demonstrated in Fig. 11.

Table 5 Post-task PRPSA result details and the distribution of participants in the Moderate/high and Low anxiety groups
Fig. 11
figure 11

Boxplot presenting the screening and post-task results of PRPSA scores across both Moderate/high and Low anxiety groups. Participants 9 and 10 reported the highest PRPSA scores post-task, whilst participant 19 reported the lowest

Overall, the scores in the Moderate/high group’s post-task PRPSA scores were on average of 17% lower compared to their screening PRPSA results. The maximum decrease was 27.9% in a participant who dropped to a post-task PRPSA score of 80 from their 111 screening PRPSA score. The minimum difference was the participant who scored 121 post-task from their 125 screening PRPSA test. This was further confirmed by the follow-up interviews. All participants from the Moderate/high group expressed that they found, as they progressed through the three scenarios, that they felt more confident and less anxious. The participants also felt that the experience closely resembled the experience of public speaking in the real world.

The Low anxiety group also shows an average 4.8% decrease in their reported perceived anxiety levels. Two participants in the Low anxiety group actually had slight increases in their post-test PRPSA scores. One had an increase of 5, going up from 83 to 88 and another one from 91 to 95. In the follow-up interview, they expressed some anxiety from using VR technology and that they anticipated cybersickness due to what they had heard. However, neither of them reported any symptoms of cybersickness post study. The lower drop of post-task PRPSA score in the Low anxiety group is a result of initial lower scores in the screening PRPSA. The participants’ perceived anxiety levels were not high to begin with and they were not very anxious. Therefore, the intervention can not have a substantial effect on their perceived anxiety levels.

Based on these results, it is possible to conclude that PublicVR was effective at positively changing participants’ perception of anxiety toward non-verbal communication related to the implemented factors, as evidenced by the reporting of reductions in self-reported perceived anxiety levels of up to 13.15% overall across all participants.

4.4 PRPSA validity and statistical significance

To assess the validity and statistical significance of the results, both the screening and post-task PRPSA results were analysed through a Homogeneity, Normality and T-Test. A Homogeneity test was performed which indicated a homogeneous distribution with significance of 0.056 and 0.187 for the screening results and post-task results, respectively. Table 6 illustrates the results for homogeneity of variances test.

Table 6 Results of homogeneity of variances test on both screening and post-task PRPSA

In the next step of analysis, the data were tested for normality. As presented in Table 7, the results from screening and post-task PRPSA for both Moderate/high and Low anxiety groups are above the alpha value threshold of 0.05. Hence, the results are included in the normal distribution category and can be used for the parametric statistics T-Test.

Table 7 Results of normality test on both screening and post-task PRPSA, across the Moderate/high and Low anxiety groups

Finally, a paired sample 2-tailed T-Test was performed to establish the statistical significance of the results of the screening and post-task PRPSA tests. As demonstrated in Table 8, the significance for the total population and the Moderate/high group is lower than the 0.05 threshold, therefore, they are statistically significant. Hence, it is possible to conclude that PublicVR has been successful in positively changing the participants’ perceived anxiety towards non-verbal communication related to the implemented factors. We performed an additional two-way repeated measures ANOVA, with results presented in Tables 9 and 10. These also confirm that the experiment results are statistically significant.

An additional post-hoc power analysis was conducted, shown in Table 11, which demonstrates that the effect of the intervention was very high on the Moderate/high group. The analysis also shows, as expected, that the effect was low on the Low anxiety group, which is still an interesting result. This can be partly due to the small sample of the Low anxiety group which is an acknowledged limitation of this study.

Table 8 2 -tailed paired sample T-Test for the screening and post-task PRPSA responses
Table 9 Multivariate tests results of PRPSA responses on two factors: (a) Screening and Post task, (b) Low or Moderate/high anxiety
Table 10 Tests of within-subjects effects results of PRPSA responses on two factors: (a) Screening and Post task, (b) Low or Moderate/high anxiety
Table 11 Post-hoc power analysis of moderate/high and Low anxiety groups

We note that in this work we compared the Low anxiety group against the Moderate/high group in order to see the impact of our solution on the participants’ perceptions that influence non-verbal communication. Measuring the impact of individual factors with varied levels and/or the causes of this impact was outside the scope of this work by design and was therefore not considered.

4.5 Analysis of follow-up interviews

The follow-up interviews were conducted with each participant to gain further insights into their preferences, the features of PublicVR, and their perception regarding these. The researchers took notes during the interviews and the notes were then transcribed and analysed.

The participants remarked on the immersive aspects of the VR experience and found that it felt like a real scenario of speaking in public. There were two participants from the Low anxiety group who expressed that whilst they enjoyed the VR experience, they were still aware that it was a simulated scenario. The features most highlighted by the participants were the messages for sharing their gaze and the timer. Almost all participants from both groups (24/25 participants), remarked, unprompted, that these two features helped them focus on skills that they didn’t consider before, i.e. focus sharing and time keeping, which helped boost their confidence. Specifically, participant 14 (P14) quoted: “I realised I wasn’t really looking at the audience and staying in my own space, but by the third scenario in the auditorium, I felt I was in control of the presentation and was there with the audience”. The messages which reminded them to not look down and to not focus on only a section of the audience helped them understand how to show confidence and overcome some of the anxiety they had when it came to public speaking. Participants also found the timer feature useful and interesting. They mentioned that while at the start the timer was a source of anxiety, as the scenarios progressed they felt more comfortable seeing it as they started to understand how to pace their presentation to keep to time. It also gave them more confidence in the third scenario as they felt they were ready and could see that they are progressing well.

Overall, the comments from the follow-up interviews were positive. Participants highlighted that the dynamic audience in the scenes helped them to believe they were in a real presentation and gave them a good sense of presence. Although they mentioned that they didn’t feel the audience were always responding to the events, they still found them an interesting feature which helped them with their confidence.

4.6 IPQ results and analysis

The results of IPQ (stage iii) were used to assess the sense of being present in the VR simulation. The test was developed on a 7-point Likert scale (fully disagree \(=-3\) to fully agree \(=3\)). Questions 3,4 and 9 were assigned reverse values as they have a negative tone so that the final scores could be added. The results for each of the questions in IPQ are presented in Fig. 12 and the results for each subscale are presented in Fig. 13 and Table 12. As demonstrated in the figures and table, the participants found PublicVR to create a high level of presence. Specifically, each IPQ subscale in our study achieved an average score of more than 1.5 (corresponding to more than 4.5 in a 1–7 Likert scale) which is described as having “very good presence” compared to other studies exploring VR experiences with the IPQ score (Melo et al. 2023). Participants found PublicVR to create a real feeling of being there, which is also confirmed by their comments in the follow-up interviews.

Fig. 12
figure 12

Bar chart illustrating the average score on the 14 questions in IPQ

Fig. 13
figure 13

Bar chart illustrating the average scores for each subscale in IPQ

Table 12 Average and standard deviation for each subscale in IPQ

5 Concluding discussion

In this paper we presented PublicVR, a Virtual Reality Exposure Therapy intervention that was proposed to support adults with a self-reported speech anxiety in improving their anxiety related to their non-verbal communication skills. This study contributes to the limited body of knowledge on non-verbal communication skills. An empirical evaluative study was reported which investigated its efficacy and the participants’ sense of presence experienced in our Virtual Environment. Our results suggest that PublicVR is an effective tool with the potential to positively change perceived anxiety related to non-verbal communication skills. Specifically, our findings indicated that PublicVR successfully managed to positively change the participants’ perceived self-reported anxiety towards non-verbal communication in a public speaking setting, evidenced through a decrease in their perceived anxiety levels (Table 5 and Fig. 11). Accordingly, participants reported a high sense of presence in our VR simulation experience (Figs. 12 and 13). These results are also confirmed through the follow-up semi-structured interviews where participants also highlighted the positive impact of PublicVR to improve anxiety related to their non-verbal communication skills, specifically gaze and time keeping.

A number of contributions arose from this work. Specifically, past clinical studies indicated that VR can be an efficient approach to overcoming public speaking anxiety (Lee et al. 2020). In this work, we contributed to this body of knowledge by demonstrating that VR can further positively change self-reported anxiety levels resulting from the four specific public speaking factors investigated in this work—Peer anxiety, Stage fear, Lack of confidence, and Shyness. This is particularly important as past work has shown that increases in public speaking confidence as a result of using VR interventions can be transferable into real-life situations and were retained after the study had ended (Safir et al. 2012); in fact, our participants indicated that they realised the importance of gaze and time-keeping in delivering a public talk at the end of this study; we therefore expect that our results can have a longer lasting effect to users of PublicVR.

Our work further demonstrated the importance of receiving live positive feedback/instructions in the process of public speaking training. Specifically, our participants indicated the usefulness of receiving live positive feedback and supported that it was a significant factor for the improvement in their anxiety levels towards their non-verbal communication. In fact, previous research has shown that positive feedback increases both performance (Hattie and Timperley 2007) and self-efficacy (i.e. one’s ability to successfully cope with future demands) (Brown et al. 2012), which has been shown to moderate the negative effects of stress (Schönfeld et al. 2016). In addition, it has been demonstrated that positive feedback through positive reinforcement, for instance through verbal praise in our work, increases the probability that a targeted behaviour (e.g. improve anxiety perceptions of non-verbal communication skills in our work) will occur (Feist et al. 2006).

Along the same lines, providing immediate (or ‘live’ in our case) feedback during the learning process has also been shown to be an effective way to improve performance (Perera et al. 2008). In fact, real-time feedback in VR, as well as in Augmented Reality (AR) and Mixed Reality (MR) has been found to have positive effects in many application areas (Geisen and Klatt 2022). In contrast, in real-life (non-VR) training conditions participants are often not able to receive similar actionable feedback in real-time without interrupting the presentation flow. In our work, we demonstrated that PublicVR offers a highly perceived sense of presence where the provision of live feedback does not have a negative impact on the presentation flow and it maintains participants’ involvement with the task at hand. Nevertheless, in comparison to other application areas, more research work is needed on the investigation of live feedback in public speaking training, especially considering the multifaceted nature of non-verbal public speaking skills which are affected by a number of both internal and external factors (Sect. 2.2). Indeed, past work has explored the use of different feedback approaches in public speaking training (Chollet et al. 2015; Schneider et al. 2015); however, most efforts focused on individual and fragmented elements of the training experience (e.g. presenter’s voice quality, timing, or audience). Accordingly, in this work we also demonstrated that visual real-time feedback is a helpful approach, however, the application of multimodal real-time feedback (i.e. a combination of visual, audio, haptic/vibrotactile) can provide further benefits, thereby addressing the different contributing factors in a more holistic manner, and it is something that must be explored further.

Finally, it is accepted that public speaking anxiety has been classified as a social phobia and as an anxiety-related disorder (Bell 1994). Over the years, numerous approaches have been proposed to deal with public speaking anxiety, including technology-enhanced methods. The use of mobile apps for peer feedback (Shamsi et al. 2019), Artificial Intelligence (AI)-generated feedback (Chen 2022) or VR as a clinical tool in mental health research and practice (Bell et al. 2022) are not new. There are also a number of commercial applications for the same purpose. However, there is a lack of empirical evidence and design-lead studies on the use of technology for public speaking anxiety. Our work therefore contributes to the ongoing efforts for empirical studies in this field and further confirms the usefulness of VR as an exposure therapy approach to reducing public speaking anxiety recently reported by Lindner et al. (2019).

Our findings present certain limitations. First, we acknowledge that the number of participants with self-reported speech anxiety was small, so further studies with a larger number of participants are needed. We also acknowledge that the Low anxiety group was smaller than the Moderate/high group, therefore the effect and significance of the results for the Low anxiety group have been affected. Nevertheless, the insights at this stage of our work have been very useful to understand the potential impact of PublicVR and will be used towards our future efforts. Second, all results in this study were participant self-reports, which are subjective in nature and can not provide a holistic account of participants’ experiences from using PublicVR. Using an additional dimension of objective physiological measures, such as eye gaze, Electroencephalography, Galvanic Skin Response, etc. can provide additional meaningful insights, so this is also part of our future research directions. Third, we did not investigate participant responses per test scenario or the impact of individual factors in this work, as one of our main objectives was to measure the effectiveness of PublicVR in addressing the identified factors and not to measure the impact of any changes in test scenario conditions as a result of using PublicVR. We also wanted to avoid negatively affecting participants’ perception of presence as a result of measuring each test scenario separately. This however constitutes an avenue for future work. Finally, this study did not report on the long-term effectiveness of PublicVR, as it was outside the reported scope. Accordingly, our findings present another main avenue for future work which includes a longitudinal study to fully understand the mechanisms by which VR reduces speech anxiety symptoms and to identify the most long-term effective VR interventions for different individuals. In addition, we will also investigate how the different test scenarios presented in this work compare in terms of their presence, which we expect that it will produce further interesting insights. Overall, this work can contribute to ongoing efforts to determine the effect of Virtual Reality Exposure Therapy interventions to the treatment of social phobias.