1 Introduction

The coronavirus disease-19 pandemic has clearly driven the recent trend towards digitalising healthcare, but the concept of digital healthcare already existed before this crisis (Masneri et al. 2023; Pedram et al. 2023; White and Shaban-Nejad 2022). The healthcare sector by its very nature has always been connected to technological advances, so it has generated retrospective feedback on all aspects of the digitalisation process. Technology has had an impact on medicine in many different areas, ranging from purely technical features (i.e. interventions and treatments) (Cavanagh et al. 2022) to largely relational aspects (e.g. provision of information to patients) (Oliver Wyman 2021).

Medicine has thus become increasingly linked to big data (Aebi et al. 2021), artificial intelligence (AI) (Morales-Lara and Adedinsewo 2022), telediagnosis (Galván et al. 2021; Minervini et al. 2022) and computer systems, among other innovations. Another digital innovation applied in medicine is virtual reality (VR). For example, RelieVRx – formerly known as EaseVRx – was authorised by the United States’s Food and Drug Administration for sale as part of treatments for moderate to severe lower back pain (Giravi et al. 2022; Rubin 2021). These major advances not only provide advantages in the present but also promise great benefits in the long term. Healthcare innovation is doubtlessly here to stay.

VR deserves special attention within the vast number of ways that digitalisation and technology can contribute to education due to the novelty and possible uses of VR applications (Calvert and Hume 2023; Chang et al. 2023; Howard and Davis 2023; Lau 2023; Stefan et al 2023). Healthcare is a field in which VR is most expected to play a vital role due to its wide range of applications (Aiello et al. 2023; Pieterse et al. 2023). VR can provide assistance in diverse medical areas including remote procedures (Matsangidou et al. 2022) and treatments of problems such as phobias (Freitas et al. 2021), obesity (Anastasiadou et al. 2022) or Alzheimer’s disease (Kruse et al. 2022).

VR usage in medical training is particularly interesting. VR, healthcare and education in combination offer many possibilities for the creation of immersive simulations or more simple, three-dimensional (3D) visualisations of parts of the body (Im et al. 2023). VR applications can allow a larger number of medical professionals and students to have access to the same training course, reduce their training schedule (Pottle 2019; Oxford Medical Simulation 2019) and increase their motivation and interest in the topics being studied (Sattar et al. 2020). Because of these benefits, the demand for VR tools is growing in hospitals and universities.

VR applications in medicine are revolutionising the way medical professionals diagnose, treat and educate patients (Fertleman et al. 2018). For instance, VR can be used to conduct mirror visual feedback (MVF) treatments, which provide visual feedback to alleviate phantom limb pain or improve motor function in a more engaging, realistic manner (Bullock et al. 2020). VR also increases the flexibility of exposure therapy (ET), which involves simulated exposures to a feared situation or object (Ezendam et al. 2009). This technology is also often an effective tool in learning processes (Lee and Wong 2008), so the current research took on the important task of determining what key factors affect the acceptance of VR in medical training and how these variables work in this context.

The success of this type of technology depends on the medical community’s perception of VR applications as valid, functional, accessible and valuable. In other words, their use in medical training needs to be accepted by the relevant doctors. Concurrently, medical students must be interested in using these devices and training simulators or the benefits associated with applying VR in educational settings will remain unrealised. Overall, students must prioritise their resources (e.g. time and money). If VR is perceived as a time- and money-saving tool, these individuals will be more likely to use the related devices (Fussell and Truong 2022).

The literature includes few studies that have focused on this topic, and the limited research on VR and medicine has assumed that healthcare professionals will generally adopt these new techniques without hesitation given the clear benefits of this technology (Lange et al. 2020). To address these gaps, the present investigation sought to identify the most crucial factors driving doctors and medical students’ intention to use VR in their learning and training. This study thus concentrated on addressing the deficiencies in prior research pointed out by various authors (Lin and Peh 2019; Sagnier et al. 2020) in order to clarify which determinants drive medical students to use VR in their learning.

Aliwi et al.’s (2023) review of the literature on VR in medical practice identified additional methodological gaps related to techniques, sample size and outdated equipment. The cited authors, therefore, called for more empirically rigorous assessments of VR technology in healthcare training. To respond to this appeal, the present study applied a multidimensional approach that included the elements of context, content, process and individual differences, with reference to Davis’s (1989) technology acceptance model (TAM) and Venkatesh et al.’s (2003) unified theory of acceptance and use of technology (UTAUT). The present literature review included examining these models, both of which have been widely applied in related research.

A mixed-methods approach was selected for this research given the novelty of healthcare VR technology and the absence of qualitative studies specifically focused on this topic. The first phase concentrated on gathering qualitative data from healthcare experts to identify which variables needed to be assessed. In the second phase, a quantitative study was conducted using structural equation modelling (SEM) to find which factors have a significant effect on the continued use of VR as a teaching method in healthcare.

The results provide stimulating ideas that can help increase VR usage and strengthen healthcare providers’ commitment to VR applications. The selected methods effectively identified the factors influencing the chances that VR will be well received – given the existing demand in healthcare – including the drivers of and obstacles to the continued use of VR. In addition, the findings provide information that companies offering this technology can apply to design and market learning tools of value to users, as well as highlighting the variables that may indicate a greater or lesser proclivity for using VR. Instructors and curriculum coordinators can also use this expanded knowledge to design programmes that prepare students to use VR in their medical training and encourage them to adopt VR for personal use. In this way, these individuals will gain greater familiarity with and enjoyment of this technology during their learning experiences.

2 VR usage in healthcare training

The term VR first appeared in 1987 when Jaron Lanier, together with Tom Zimmerman, developed the data glove. More recently, Lopreiato et al. (2016) defined VR as follows:

VR is a wide range of applications based on computers commonly associated with immersive, highly visual 3D characteristics that allow the participant to look around and navigate inside a world that is apparently real and physically present. VR is generally defined based on the type of technology used, such as visual display units placed on the head, stereoscopic vision capacity, input devices and the number of sensory systems stimulated. (p 40)

Experts, however, continue to debate how best to conceptualise VR. This technology has developed quite naturally, causing its definition to expand and evolve in parallel ways and inspiring scholars to integrate new perspectives into its conceptualisation.

VR has been used in healthcare to help patients recover their mobility (Yeo et al. 2019), to treat phobias (e.g. Psious VR or Amelia) (Freitas et al. 2021) and migraines (e.g. GlaxoSmithKline or GSK) (Vekhter et al. 2020) and to facilitate medical training (Zhao and Li 2022). VR applications in medical education allow a larger number of medical professionals and students to gain access to the same training session and to participate as often as they want (Oxford Medical Simulation 2019). These benefits have increased the demand for VR tools in hospitals and universities.

Academic research on VR in healthcare has reflected the medical community’s growing interest, expanding significantly in recent years and focusing largely on applications of this technology in health services for individuals with disabilities (De Ribaupierre et al. 2014; Levin et al. 2014; Maples-Keller et al. 2017; Sveistrup 2004). VR applications simulating different surgeries also encompass a wide range of procedures and treatments. Other applications include MVF and ET, as well as simulators of dental and bone surgery, intubation, body, eye surgery and endoscopic techniques (Ruthenbeck and Reynolds 2015). The medical literature has long acknowledged that VR can be used to enhance capabilities and train different medical groups based on their individual and collective needs (Samadbeik et al. 2018).

In the field of education, other studies have addressed the question of how VR enhances medical training (King et al. 2018; Lie et al. 2022; Mantovani et al. 2003; Pottle 2019) and have concentrated on analysing the impact VR has on learning. According to Pottle (2019), the main benefits are improvements in students’ learning curve and access to courses. King et al. (2018) and Mantovani et al. (2003) highlight the benefit of a greater absorption capacity. Students who are taught via VR retain more information and apply what they have learned more fully as compared to the results obtained with other methods (Krokos et al. 2018). VR also improves medical students’ ability to learn by awakening a greater interest in – and motivation to study – the relevant topics (Sattar et al. 2020).

However, prior studies have ignored other aspects of VR usage in healthcare professionals’ training. Only a few recent investigations thus far have focused on the main factors affecting the acceptance of VR technology in medical learning (Ustun 2020; Cabero-Almenara et al. 2023; Sprenger and Schwaninger 2021; Ustun et al. 2022). No researcher, however, has applied the UTAUT and TAM to confirm how well VR applications meet needs in healthcare learning from the students’ perspective.

2.1 Students’ attitudes towards medical learning with VR

Before incorporating VR into educational programmes, educators must understand the factors that motivate students to use it. VR is a relatively new technology that has not yet been widely adopted, so only a small percentage of students will have had any VR experience before using it as a learning tool. Most students thus need to familiarise themselves with these technologies. This preparation can impact their acceptance of VR technology, especially in terms of behavioural intention to use, which in turn could affect students’ level of engagement and learning (Sagnier et al. 2019; Shen et al. 2019a, b).

Many educators assume that VR will benefit students while training and that they will accept this technology without hesitation. However, VR applications can also have a negative impact on students’ learning process (Makransky et al. 2017). For instance, these individuals can become frustrated if they are unfamiliar with – or inexperienced in using – this technology, which can adversely affect their learning performance when utilising VR (Maraj et al. 2015). According to Fussel and Truong (2021), students may also only embrace VR in a dynamic learning environment if they understand how this technology works and how it will benefit their education.

The literature, therefore, indicates that user acceptance and motivation for use are crucial factors to consider when explaining the effectiveness of VR technologies in medical education (Aggelidis and Chatzoglou 2009; Barteit et al. 2021; Beke Hen 2019; Holden and Karsh 2010). Previous studies have found evidence that medical students tend to show high levels of participation, which has a positive effect on how well these individuals learn. Désiron et al. (2022) assert, nonetheless, that the adoption of these new technologies in medical training remains an open question.

2.2 Adoption of new technologies

Studies of technology acceptance have referred to the available general models either directly or by adapting them to multiple technological contexts with new factors and relationships between these. Various authors (Kim and Crowston 2011; Oliveira and Martins 2011) have reported that the most widely used models applied to explain technology acceptance by users are Davis’s (1989) TAM and Venkatesh et al.’s (2003) UTAUT.

The TAM was developed by Davis (1989) to create a methodology that could measure users’ acceptance of new technological systems before they are implemented. This model is thus fully adapted for assessments of the adoption of technologies, especially computer systems. Despite the widespread use of TAM, it has been criticised by multiple authors (Bensabat and Barki 2007). More specifically, critics have focused on problems such as limited explanatory and predictive power, triviality and low practical value (Chuttur 2009), as well as the omission of key factors such as cost and certain structural imperatives (Lunceford 2009).

In response to these criticisms, Venkatesh and Davis (2000) presented an expanded TAM (i.e. the TAM2) that includes social influence (i.e. subjective norms, voluntariness and image) and cognitive instrumental processes (i.e. job relevance, output quality, result demonstrability and perceived ease of use). Venkatesh and Bala (2008) also developed a third version of the TAM (i.e. the TAM3) that adds the explanatory constructs of computer self-efficacy, perceived external control, computer anxiety, computer playfulness, perceived enjoyment and objective usability.

The UTAUT and UTAUT2 are, however, the models most often referred to in more recent research (Tamilmani et al. 2021). The UTAUT is a TAM formulated by Venkatesh et al. (2003) that seeks to explain users’ intention to use technologies and their behaviours while using them. The UTAUT covers four key factors: performance expectancy, effort expectancy, social influence and facilitating conditions. Brown and Venkatesh (2005) subsequently introduced the UTAUT2 model, which incorporates three new factors related to end-user contexts: hedonic motivation, price value and habit.

Similar to scholars in other fields, healthcare technology researchers have most often applied the UTAUT and UTAUT2. More specifically, UTAUT2 has been used in studies unrelated to VR, that is, research on the adoption of electronic storage systems to keep information about patients’ health (Alazzam et al. 2016, 2015; Tavares and Oliveira 2014), wearables that control patient information (Jin and Ahn 2019), cell phones (Ameri et al. 2020) and applications that control child vaccination (Algahtani et al. 2021).

The present literature review (see Table 1), however, revealed that few studies have concentrated on behavioural intention to adopt VR in medical learning. Only some recent investigations have associated this intention with the perceived usefulness and ease of use of VR technologies, and previous research has mainly relied on the TAM (Cabero-Almenara et al. 2023; Sprenger and Schwaninger 2021). Désiron et al. (2022) and Ustun et al. (2020, 2022) are the only authors who have used the UTAUT to analyse factors affecting VR adoption in medical learning. However, they stopped short of examining the relationships between these variables.

Table 1 Prior studies of doctors and medical students’ intention to use VR in their learning and training programmes

Research on technology acceptance in all sectors has applied the TAM and UTAUT as originally conceived (Murillo et al. 2021) or used a combination or an adaptation of these models (Kayali and Allaraj 2020) to fit technological contexts with new factors or links between these. These two aspects similarly need to be considered when discussing the acceptance of VR in medical training contexts. Nonetheless, no previous studies have used the UTAUT2 to assess how well VR applications meet needs in healthcare learning from the students’ perspective, and no investigations have combined the UTAUT2 with the TAM in any of its versions.

3 Hypothesis development

Disentangling the factors affecting VR adoption and use in medical education is crucial because this technology provides new practical training tools that medical specialists can introduce into their learning processes (Georgieva-Tsaneva and Servezova 2020). The literature review covered in the previous section revealed that a wide range of models have already been developed to explain the acceptance of different learning technologies (e.g. mobile applications, AI or augmented reality) and users’ continuance intention (Alzahrani 2020; Criollo et al. 2021; Lu and Yang 2014). In addition, the review confirmed the scarcity of previous research on healthcare learners’ acceptance of VR that was based on the UTAUT and TAM.

The UTAUT 2 and TAM1, 2 and 3 have aggregated – and exploited the synergies between – many constructs that are positively related to technology adoption and that can also play a critical role in learning processes. The present study thus took the UTAUT2 and TAM3 as a starting point for its hypotheses regarding the identified variables, as discussed in the following subsections.

3.1 Performance expectancy

Performance expectancy – coded as PER in the analyses – refers to learners’ belief that using VR training technology is helpful (Azizi et al. 2020). Venkatesh et al. (2012) define performance expectance as ‘the degree to which an individual believes that using a system will help him or her to gain profit... [through improved] performance’ (p 159). Many previous studies have confirmed that performance expectancy has a positive impact on behavioural intentions (Chao 2019; Chen and Hwang 2019; Huang 2020; Mtebe and Raisamo 2014). The current research thus included the following hypothesis:

H1

Performance expectancy has a significant effect on medical learners’ intention to use VR technology.

3.2 Effort expectancy

Venkatesh et al. (2012) conceptualise effort expectancy – coded as EE in the present proposed model – as ‘the degree of simplicity and ease of use of a system’ (p 159). According to Azizi et al. (2020), this construct overlaps with ease of use. Prior research has also found a positive relationship between effort expectancy and behavioural intention (Chao 2019; Dajani and Hegleh 2019; Raza et al. 2021). Therefore, the current study added a second hypothesis:

H2

Effort expectancy has a significant effect on medical learners’ intention to use VR technology.

3.3 Social influence

Venkatesh et al. (2012) define social influence – coded as SI in the current research – as ‘the degree to which an individual perceives that others (e.g. peers and faculty members) believe he or she should use a modern system or a new approach in learning’ (p 159). Previous studies have shown that social influence has a significant influence on behavioural intention (Al-Gahtani 2016; El-Masri and Tarhini 2017; Moorthy et al. 2019). The third hypothesis developed for the present research reflected the above findings:

H3

Social influence has a significant effect on medical learners’ intention to use VR technology.

3.4 Facilitating conditions

The term facilitating conditions – coded as FC in subsequent analyses – ‘refers to consumers’ perceptions of the resources and support available to perform a behavior’ (Venkatesh et al. 2012, p 159). Prior investigations have confirmed a positive relationship exists between facilitating conditions and behavioural intention (Masadeh et al. 2016; Moorthy et al. 2019; Nikou and Economides 2017; Shen et al. 2019a, b). Thus, the fourth hypothesis created for the current study was worded as follows:

H4

Facilitating conditions have a significant effect on medical learners’ intention to use VR technology.

3.5 Hedonic motivation

Venkatesh et al. (2012) conceptualise hedonic motivation – coded as HM in the present proposed model – as ‘the user’s fun or pleasure... [obtained from] using... technology’ (p 161). Previous studies have found a positive connection between hedonic motivation and behavioural intention (Al-Azawei and Alowayr 2020; Moorthy et al. 2019). The current research thus included the following hypothesis:

H5

Hedonic motivation has a significant effect on medical learners’ intention to use VR technology.

3.6 Price value

Venkatesh et al. (2012) define price value – coded as PV in the present analyses – as ‘consumers’ cognitive trade off between the perceived benefits of the applications and the monetary cost of using them’ (p 161). Prior research has verified that a positive link is present between price value and behavioural intention (Al-Azawei and Alowayr 2020; Lewis et al. 2013). The sixth hypothesis developed for the current study reflected the above findings:

H6

Price value has a significant effect on medical learners’ intention to use VR technology.

3.7 Habit

Habit has been defined as the extent to which people tend to perform learned behaviours automatically (Limayem et al. 2007). Habit has been operationalised in two distinct ways (Venkatesh et al. 2012, p 161): as prior behaviour (Kim et al. 2005) and as the extent to which individuals believe their behaviour is automatic (Kim et al. 2005; Limayem et al. 2007). Previous studies have confirmed a positive relationship exists between habit and behavioural intention (Baptista and Oliveira 2015; Morosan and Defranco 2016; Yahia et al. 2018). The final hypothesis formulated for the current research was thus as follows:

H7

Habit has a significant effect on medical learners’ intention to use VR technology.

4 Methodology

A mixed-methods approach and sequential research design were chosen to adjust the TAM and UTAUT2 factors so that they more adequately reflect VR adoption in medical contexts, as well as to achieve the proposed objectives (Onwuegbuzie et al. 2009) (see Fig. 1). The variables were first selected based on a qualitative study. In the second phase, a survey was conducted, and the data were analysed using quantitative techniques. All the steps of both phases are shown in Fig. 1.

Fig. 1
figure 1

Methodological approach

The qualitative research facilitated a more accurate formulation of the hypotheses and identification of the variables included in the conceptual model. Mixed-methods research is seen as an important third type of methodology because, by combining quantitative and qualitative approaches, it provides expanded opportunities to explore relatively unknown or understudied topics (Moscoloni 2005; Rocco et al. 2003; Tashakkori and Teddlie 2003).

4.1 Qualitative study

The current qualitative research was conducted by following the steps suggested by Becker et al. (2002), namely, in-depth interviews of individual experts and codification and classification of the ideas provided. Grounded theory guidelines were followed when the medical experts’ interviews were scheduled. The interviewees thus comprised three medical interns and two medical students with more than three years of researching VR education and medical training (Glaser and Strauss 2017).

In step one, in-depth interviews were first scheduled and then conducted in these experts’ place of work. The interviews began with an introduction to the variables mentioned in the literature on the TAM and UTAUT that can affect individuals’ acceptance and use of a technology, after which the interviewees were invited to comment on these factors or propose new ones. In step two, the experts’ opinions were recorded, compiled and analysed.

Step three of the qualitative phase consisted of identifying which variables should be included in the quantitative phase. The interviewees’ answers were codified and classified to ensure more empirically robust results. The only factors subjected to subsequent analyses were those fully supported by the majority of the experts (i.e. those reaching the point of theoretical saturation). More specifically, factors were only selected if at least four out of the five experts interviewed agreed that the variables were significant.

The findings included that the interviewees lacked any knowledge about how VR can be applied in medical training, and only one individual was familiar with this technology outside of educational settings. The experts thus all agreed that the variable habit did not appear to make much sense in the research context, so this factor was excluded from the quantitative analysis. In contrast, the variable performance expectancy was considered important because the interviewees wanted to know how effective this type of tool would be in their training. More specifically, they were quite interested in how VR could shorten their learning curve. This reaction suggested that performance expectancy could be measured as a variable related to learning time.

Effort expectancy is also a key factor in most explanatory models in the existing literature. All the interviewees agreed that the limitations of VR technology in terms of its comfort, accessibility and usability needs to be clear as these variables could either constitute obstacles or facilitate VR usage. With regard to effort, the results also reveal an interest in the interactivity this technology could contribute to training courses. The experts were specifically interested in VR tools’ potential for gamification since the interviewees had experienced this aspect in videogames. This idea was included in the research model as a new factor focused on the entertainment value of this technology in educational processes, which was labelled ‘perceived entertainment’ and coded as PEN.

Facilitating conditions was also included in the model since all the interviewees agreed that the main course structure should fully support VR technology. The experts underlined that, in this setting, the most important component is the university or hospital providing the employees’ training. Social influence was also considered by four out of five interviewees to be a determinant in this context. These experts asserted that, in the medical profession, their own behaviour is affected to a greater or lesser extent by colleagues’ methods, tools and conduct. The interviewees also mentioned that a hierarchy exists in their job so that the comportment of upper-level staff members seen as role models can noticeably affect their subordinates’ attitudes. This second feature reinforced the need to include a construct to measure social influence.

The interviewers additionally elicited information regarding a possible component of hedonism by exploring how important pleasure and fun is to the experts in their medical practice. All the interviewees agreed that learning activities, in general, need to heighten their level of enjoyment, so this is a significant factor when choosing to use or reject VR technology.

Another variable that produced notable results was price. The research model included price value because the experts emphasised the importance of the final price of this technology. For example, they were quite interested in how much VR tools might affect the overall cost of their training courses. As noted previously, the interviewees’ comments highlighted that none of them had used VR in their education and that few of their colleagues were familiar with VR and even fewer knew about its use in learning processes.

In summary, the factor of habit was removed from the conceptual model because of both the qualitative findings and the final theoretical framework adopted, even though this variable appears in the UTAUT2. A new factor – perceived entertainment – was added to the proposed model in subsequent analyses. As mentioned previously, this variable was coded as PEN in the present analyses.

These decisions were based on the interview results as the experts considered habit unimportant, while perceived entertainment was thought to be a significant motivator of VR usage in medical training. The existing literature on perceived entertainment defines this factor as ‘the extent to which the activity of using the technology is perceived to be enjoyable in its own right, apart from any performance consequences that may be anticipated’ (Davis et al. 1992, p 1113). Since the TAM3 was developed (Venkatesh and Davis 2000), multiple researchers have reported evidence of a positive relationship between perceived entertainment and behavioural intention to use in learning processes (Alyoussef 2021; Chao 2019).

Therefore, all the hypotheses were maintained except for H7, which was replaced by H7b. The latter was formulated as follows:

H7b

Perceived entertainment has a significant effect on medical learners’ intention to use VR technology.

Figure 2 shows the final research model and the relationships incorporated.

Fig. 2
figure 2

Research model based on unified theory of acceptance and use of technology (UTAUT)2, technology acceptance model (TAM)3 and present qualitative study Note. H = hypothesis

4.2 Quantitative study

The final version of the questionnaire created for this research included items assessing each of the factors taken from various TAMs that had been expanded or modified to reflect the results of qualitative studies. Three items were taken from Sung et al.’s (2015) work as indicators of performance expectancy (i.e. PER), as well as three items assessing effort expectancy (i.e. EE). Three items were drawn from Rese et al.’s (2017) scale for perceived entertainment (i.e. PEN). Facilitating conditions (i.e. FC) was measured with three items adopted from Attuquayefio and Addo (2014). Four indicators of social influence (i.e. SI) were taken from Tu et al.’s (2014) research. Hedonic motivation (i.e. HM) was assessed using three items drawn from Molinillo et al.’s (2017) work, and two indicators of price value (i.e. PV) were adopted from Sung et al.’s (2015) scale. Finally, the present questionnaire also included one measure of behavioural intention (i.e. BI) taken from Venkatesh et al.’s (2012) study.

The survey thus contained a total of 22 items pertaining to constructs taken primarily from the UTAUT2 and TAM3. The responses to the items were measured on a Likert-type scale that ranged from 7 (‘Very likely’) to 1 (‘Quite unlikely’). This scale has been used by most studies of users’ opinions and attitudes (Joshi et al. 2015). The questionnaire further elicited information about the respondents’ identity and sociodemographic characteristics.

Before the fieldwork began, the questionnaire was sent to seven education experts, who checked if the wording was clear and the items covered the variables to be measured. These specialists’ feedback was used to refine three terms and two items in the questionnaire.

The research population comprised medical students and licensed physicians at Spanish universities and hospitals, which was considered an infinite population for sampling purposes. A representative sample of 154 individuals was gathered that fit these parameters. The fieldwork was conducted during August and September 2021 using personal interviews or self-administered surveys. The data were collected entirely in hospitals and university medical centres in Andalusia, Valencia, Castile-La Mancha, Extremadura, Murcia, Madrid, Catalonia and the Basque Country. This study obtained ethical approval from the ethics committee of the researchers’ university in order to ensure compliance with established ethical guidelines.

The questionnaires were distributed using convenience sampling at the selected locations. The interviewer first asked the respondents why they were in the hospital or university to screen out inappropriate individuals. If they were there for medical training, they could answer the questionnaire. Because participation was voluntary, all the individuals included in the sample provided informed consent prior to participation by voluntarily signing the informed consent form.

The survey answers were automatically placed in online folders. The respondents’ demographic characteristics, medical field and VR experience in their medical training were included in the dataset (see Table 2).

Table 2 Sample profile

The final step in the quantitative phase was a statistical analysis of the data using partial least squares (PLS) and SEM because the main objective was to identify causal relationships between the variables and behavioural intention (Ringle et al. 2015). The proposed model was evaluated in a two-step procedure that followed Chin’s (1998) guidelines. First, the outer (i.e. measurement) model was validated by assessing the reliability, convergent validity and discriminant validity of the constructs (Barroso et al. 2010). This model comprised the variables selected by the experts interviewed in the qualitative phase.

Next, the inner (i.e. structural) model was analysed to examine the hypothesised relationships between the constructs and assess the overall predictive capability of the proposed model. This step used the methodology outlined by Hair et al. (2014). The results section below provides more detailed information about the process of validating both the measurement and structural models.

5 Results

As shown in Fig. 2 above, the qualitative component of the present study produced a list of factors selected by the experts interviewed: performance expectancy, effort expectancy, social influence, facilitating conditions, hedonic motivation, price value and perceived entertainment. These variables were included in the measurement and structural models validated using PLS-SEM to ensure the research objectives were achieved.

5.1 Measurement model estimation

Various evaluation measures were used to assess the goodness of fit of the model. These methods included Cronbach’s alpha, composite reliability and Hair et al.’s (2014) convergent validity test.

5.2 Common method bias (CMB)

The issue of CMB was dealt with by applying two statistical control methods. First, Harman’s single-factor test was conducted using principal component analysis to confirm the presence or absence of CMB. The results indicate that less than 50% of the total variance is explained by any one factor, thereby verifying that no CMB is present (Fuller et al. 2016; Podsakoff et al. 2012).

Next, the full-collinearity test recommended by Kock (2015) was performed, which revealed low variance inflation factor values ranging from 1.000 to 2.185 for all latent constructs. These values fall below the threshold of 5, so collinearity is not an issue (Hair et al. 2011). The present results thus indicate that CMB did not distort the findings.

5.3 Reliability and validity assessment

To assess the measurement model, the first step was to determine the reliability of each item (see Table 3). Measures or indicators can be accepted as valid for a construct if the loading of the indicators on that variable is greater or equal to 0.707 (Barclay et al. 1995; Hair et al. 2011), which implies that the construct and the shared variance of its indicators are larger than the error variance (Carmines and Zeller 1979). In the present study, all the indicators have loadings above 0.707, except for the items PEN2, SI1 and SI2 – each with a value below 0.6. All three were removed from their respective scales, so perceived entertainment and social influence were subsequently each assessed using two items.

Table 3 Loading, composite reliability (CR) and average variance extracted (AVE) values

Composite reliability was checked to ensure the internal consistency of all the constructs incorporated into the model, thereby providing an empirically rigorous evaluation of how well the relevant indicators measure the same latent variable. The analysis produced values well above the minimum of 0.70, thereby confirming strong internal consistency (see Table 2 above). Composite reliability values are considered satisfactory when they fall between 0.60 and 0.70 in exploratory research, although other types of studies further along in the research process must meet a more stringent standard of between 0.70 and 0.90 (Nunnally and Bernstein 1994). Values below 0.60 indicate a lack of reliability (Hair et al. 2011; Richter et al. 2016). The results for the present model verify that all the constructs satisfy this criterion and thus can be considered reliable (see Table 3 above).

The last step was to evaluate the convergent and discriminant validity of the measurement model. Convergent validity requires a confirmation that each set of indicators represents or measures a single underlying construct, which can be determined by checking if the construct is unidimensional (Henseler et al. 2009). An average variance extracted (AVE) value of over 0.50 shows that more than 50% of the variance of a construct is explained by its indicators (Hair et al. 2011, 2014). The constructs included in the current model all have AVE values greater than 0.5 except for social influence (0.432), so this variable had to be excluded (see Table 3 above).

Discriminant validity, in turn, ascertains to what extent a specific construct is different from other variables in the relevant model, which helps expose possible problems with overlapping. That is, each construct should share more variance with its own indicators than with the other variables in the model (Barclay et al. 1995). In the present research, this step in the analysis comprised three procedures. The first involved applying Fornell and Larcker’s (1981) criterion, which requires that latent constructs share more variance with their assigned indicators than with the other latent variables in the present model. In statistical terms, the AVE of each latent construct should be greater than the variance it shares with other factors in the model (Barclay et al. 1995; Hair et al. 2011; Henseler et al. 2009; Richter et al. 2016). Thus, another way to confirm discriminant validity is to check if the intercorrelation values of the constructs are lower than the square roots of AVE (see Table 3 above).

The second procedure used to determine discriminant validity is less stringent or strict, namely, checking for cross loadings. The indicators should more strongly load on or correlate with their own construct than with the other latent variables (Hair et al. 2011; Henseler et al. 2009). This technique was applied earlier during the analysis of the discriminant validity of each item (see Table 2 above).

The last procedure was to estimate the heterotrait-monotrait (HTMT) ratio (Henseler et al. 2015). The cited authors found that, when they ran simulation studies, inadequate discriminant validity could best be detected with this ratio. Discriminant validity is present if the correlations between the indicators that measure the same construct (i.e. monotrait-heteromethod) are stronger than the correlations between the indicators that assess different variables (i.e. heterotrait-heteromethod). The HTMT ratio should thus be less than 1, although Gold et al. (2001) recommend a more conservative value of 0.90. Resampling or bootstrapping can also be used to confirm if the HTMT ratio diverges significantly from 1 by measuring the confidence interval. The criterion established for this confidence interval is again that they must be less than 1 in order to demonstrate discriminant validity.

The present analysis confirmed that the variables have validity except for hedonic motivation and performance expectancy. According to Henseler et al. (2015), better results can be obtained by eliminating the items that have weaker correlations with their constructs – in this case, hedonic motivation and performance expectancy. However, the discriminant validity results for performance expectancy improved significantly after PER3 was eliminated, so only this item was removed – along with all the hedonic motivation items – from subsequent analyses. The final version of the measurement model has satisfactory discriminant validity for all the constructs as shown in Table 4.

Table 4 Correlations among latent variables and discriminant validity of first-order constructs (heterotrait-monotrait [HTMT] ratio)

5.4 Structural model assessment and hypothesis testing

After the measurement model was refined, the next step was to validate the structural model using PLS analysis in order to confirm the structural relationships between the variables included in the hypotheses. The model was evaluated based on its R2 values. The Stone-Geisser test was run (see Table 4) to confirm predictive power (i.e. acceptable Q2 scores), and the path coefficients were calculated (Cepeda-Carrión and Roldán Salgueiro 2004). The stability of the values was evaluated based on the t-statistic, using bootstrapping with 5000 resamples (see Table 5).

Table 5 Variance explained and Stone-Geisser test

The final step was to check the standardised regression weights in order to confirm if the proposed hypotheses are statistically significant (Hair et al. 2011). To validate the previous findings, a nonparametric method of confidence intervals was applied. According to Henseler et al. (2009, p 306), ‘if a confidence interval for a path coefficient’s estimated β [beta weight] does not include zero, the hypothesis that β is equal to zero must be rejected.’ The present study followed this procedure, thereby confirming the previous results (see Table 6). The path coefficient linking perceived entertainment and behavioural intention (β = 0.413; t = 2.512; p = 0.016) is positive and statistically significant. Therefore, this relationship is supported.

Table 6 Structural model results

The last step in validating the structural model was to examine its robustness by focusing on nonlinear effects, endogeneity and unobserved heterogeneity. Interaction terms were incorporated to represent the quadratic effects on BI of the links between PER, EE, FC, PV and PEN in order to identify potential nonlinearities. A 5000 sample bootstrapping procedure confirmed that no significant nonlinear effects are present.

To check for endogeneity, the Gaussian copula approach was applied as suggested by Park and Gupta (2012), which indicated that the copula values for each construct are statistically non-significant (p > 0.05), thereby confirming the absence of any endogeneity effect. Finally, unobserved heterogeneity was assessed by following the finite mixture-PLS procedure and Sarstedt et al. (2020) guidelines. The results are inconclusive for the segment solution, which demonstrates that unobserved heterogeneity has no significant impact on the research outcomes, as evidenced by fit indices such as Akiake information criterion (AIC) 3 and consistent AIC. These findings also support different segment solutions, with a normalised error value of over 0.50. The results shown in Fig. 3 confirm that the proposed structural model is valid and robust and that perceived entertainment (i.e. H7b) predicts medical learners’ intention to use VR technology.

Fig. 3
figure 3

Structural model with t-values

Statistical tests and confidence intervals can also be used to draw useful conclusions about research population parameters. The percentile bootstrap confidence interval, in particular, is recommended when measuring the adequacy of confidence intervals. The path coefficient estimates for the hypothesised relationships included in the model range from –0.050 to 0.413. The path coefficients are all statistically non-significant (i.e. a 5% significance level) except for that of PEN. More specifically, increasing PEN by one standard deviation increases intention to use VR by 0.413 standard deviations if all other variables are kept constant. Thus, the results for the model reveal a significant impact of perceived entertainment on intention to use VR, which shows that greater intention to use VR in medical training occurs when perceived entertainment is present.

The above findings are corroborated by Agudo-Peregrina et al. (2014), who also found that a strong relationship exists between perceived entertainment and intention to use VR for training. In addition, perceived entertainment has a positive impact on willingness to use VR in training programmes (Alyouseff 2021). Previous research on intention to use VR has further confirmed a significant positive link between students’ perceived entertainment and their behavioural intentions (Chao 2019).

The data analysis results thus indicate that the remaining constructs in the model have no significant relationship with intention to use VR in medical training. This finding is inconsistent with the earlier qualitative results, which could be explained by the respondents’ characteristics or the small sample size.

6 Discussion and conclusions

The qualitative interview data indicated that all the variables examined in the existing literature – except for habit – help explain acceptance of VR technology, as well as adding more details about some determinant factors. However, the quantitative study validated only one variable: perceived entertainment. This result means that the proposed model is much simpler than most approaches previously recommended by scholars. Notably, more complex models are not necessarily more streamlined or elucidatory, and more sophisticated explanatory models have been criticised by Bagozzi (2007) and Li (2020) for being overly complicated.

The present findings support Hu et al.’s (1999) rejection of the TAM as an explanatory model of acceptance of VR technologies in medical education because the single driver validated by the current study does not coincide with the multiple determinants included in the TAM. The proposed model is instead in line with the extant literature on technology acceptance in education (Agudo-Peregrina et al. 2014; Wong et al. 2023) given that the present results best fit the TAM3. This study also adds to previous research on factors that specifically influence intention to use technology (Agudo-Peregrina et al. 2014; Alyoussef 2021; Chao 2019), which has found that perceived entertainment is a key determinant of behavioural intention, in this case, to use VR in medical training.

6.1 Theoretical contributions and managerial implications

The present research went a step further and developed a model that integrates the TAM3 and UTAUT2 factors that experts on VR and medical education agree are determinants of VR adoption. The proposed approach contributes to the literature on VR in medical education by more effectively capturing the complexity of implementation and usage issues. In addition, this model is the first to confirm that perceived entertainment is a determinant of medical interns and students’ acceptance of VR applications.

From a methodological perspective, the mixed-methods approach also created a more empirically robust, context-specific framework for understanding and predicting users’ behaviour in this research setting. Regarding practical implications, the importance of perceived entertainment to individuals’ acceptance of this technology constitutes valuable information for education companies and organisations as this factor most clearly influences doctors and medical students’ intention to use VR applications. Universities and hospitals can more successfully incorporate this technology into their medical training programmes by generating pleasant experiences via innovative VR learning tools.

The qualitative study also highlighted other significant factors, especially performance expectancy, that suggest the best way to market VR products is to identify niche markets within healthcare and education, which seek to meet students’ needs with innovative applications. Benefits can be reaped from working with universities and organisations that make innovation an important component of their action plans as these entities will be more open to including VR in their learning processes – if these organisations have not done so already. In addition, social influence has a significant role, indicating that business strategies should focus on large groups rather than individuals to foster and strengthen medical professionals’ tendency to adopt VR products.

VR training solutions already exist, especially in the field of invasive surgery (Sadeghi et al. 2020). Even though VR training solutions are highly regarded in the medical field, the current qualitative study confirmed that performance expectancy is still a crucial factor determining the use of VR in medical training. In addition, the literature (Chiang et al. 2012) raises concerns about inaccuracies in images that limit the precision of real-time representations (i.e. quality imaging) produced by VR hardware devices (Chiang et al. 2012). The cited authors’ findings confirm that, while VR has made significant progress in medical training contexts, caution has to be exercised when adopting this technology in medical education. VR has potential inaccuracies, such as issues with image quality, that should be considered carefully when making adoption decisions.

6.2 Limitations and future research

As in all research, the present study had limitations. The sample was quite small (Sigala 2021), so the quantitative results must be treated with caution, namely, as a first attempt to disentangle the factors that predict intention to use VR in medical training. Future research would benefit from collecting a larger sample to facilitate data analysis that can take into account diverse medical fields and age groups.

The present qualitative results suggest that data on this topic need to be processed according to participants’ specific profiles. In general, demographic variables such as gender or age are key variables when marketing products, but, in this research context, more specific variables stand out, such as medical field and prior experience with VR technology. More in-depth investigations are needed to define more effective marketing strategies. Medical specialties are particularly important to VR applications in healthcare training because quite specific proposals have to be developed and training programmes are expensive to create in terms of time and money. Market segmentation by medical field could help companies choose the most suitable target markets that would be the most interested in adopting this technology.

Finally, the fieldwork was solely grounded in a questionnaire that sought to measure the respondents’ perceptions. Valuable insights could be gained by conducting additional studies that replicate the present research using alternative qualitative (e.g. omnibus survey) and quantitative (e.g. experimentation) techniques. Investigations also need to delve further into the differences between virtual and physical training, as well as between simulated and real-world operations.